Kenta Murotani; Akihiro Hirakawa; Yoshiko Aoyama; Takashi Yanagawa

Research Article

Austin Biom and Biostat. 2015;2(2): 1016.

Follow-up Design for Comparing Two Binary Diagnostic Tests

Kenta Murotani¹*, Akihiro Hirakawa², Yoshiko Aoyama³ and Takashi Yanagawa³

¹Center for Clinical Research, Aichi Medical University, Japan

²Center for Advanced Medicine and Clinical Research, Nagoya University Hospital, Japan

³The Biostatistics Center, Kurume University, Japan

*Corresponding author: Kenta Murotani, Center for Clinical Research, Aichi Medical University, 1-1 Yazakokarimata, Nagakute, Aichi, Japan.

Received: January 06, 2015; Accepted: May 11, 2015; Published: May 26, 2015

Abstract

Most conventional methods of comparing two diagnostic tests require patients whose true disease statuses are known. We deal with in this paper a problem of comparing two binary diagnostic tests (referred to as new and standard tests) in a follow-up design, where there are no gold standards. Assume that each patient is examined twice by new and standard tests, respectively. We employed a comparison measure ψ, which is compared on the basis of the odds ratio of the new and standard test. It is not possible to estimate ψ from the full likelihood function based on the design, even if two independent multinomial distributions are assumed to the data. Therefore, we focus only on data from discordant pairs between new and standard tests. We construct conditional likelihood conditioned on those pairs and estimate parameters involved in the conditional likelihood. An estimate of ψ is obtained by plugging those estimates in ψ. The asymptotic normality of the estimator of ψ is shown based on delta method and a confidence interval of ψ is developed. A method of sample size determination for this design is also proposed. Simulation is conducted to study the behavior of the proposed method by considering several scenarios.

Keywords: Follow-up design; Diagnostic test; Comparison; No gold standard

Introduction

Accurate diagnosis of the patient is crucial when planning the treatment of a disease. After determining the accurate diagnosis has been determined, the patient can begin receiving adequate treatment. An accurate evaluation and selection of the diagnostic method plays an important role in the patients’ health. A medical method that aims at determining whether a patient is affected by a disease is called a ‘’diagnostic test”. Particularly, diagnostic tests that evaluate the strength of suspicion of certain diseases on binary (‘’positive” and ‘’negative”) are called ‘’binary diagnostic tests”. To determine which of the two binary diagnostic tests is statistically better, the sensitivity and specificity must be closely examined [1,2]. Sensitivity and specificity are defined by the following equation: Sensitivity = Pr (T=1|D=1), Specificity = Pr (T=0|D=0), Where T indicates the diagnostic results according to the binary diagnostic test, and D indicates the actual condition of the disease. The T (D)=1 indicates positivity (disease), and 0 indicates negativity (not disease). Sensitivity is the conditional probability for patients who are actually disease to be diagnosed as positive, and specificity is the conditional probability for patients who are not actually disease to be diagnosed as negative. In both cases, values closer to 1 mean that the diagnostic test is accurate. If an observation of each patient’s actual disease condition (D) is conducted, the sensitivity and specificity can be estimated simply by calculating the proportion.

However, an accurate observation of the value of D involves methods that are often invasive for the patient. In the case of cancers, for example, the value of D can be assessed only by collecting cell samples through biopsy or surgery, and by determining the diagnosis in a comprehensive manner by using pathological and histological methods.

An example of an actual test is that of Berg et al. [3], who performed biopsy in patients with elevated risks of breast cancer for the determination of a definitive diagnosis, to examine whether ‘’mammography alone” and ‘’mammography combined with ultrasound” was effective as diagnostic tests for breast cancer detection. Similarly, in Japan, a large-scale randomized controlled trial of breast cancer screening methods (mammography alone vs. mammography combined with ultrasound) is being conducted on 100,000 women in their 40s [4]. In this study, the definitive diagnosis was determined on the basis of biopsy or surgery for patients whose overall screening results indicated a need for thorough examination. These examples involve two important issues. In other words, when sensitivity and specificity are evaluated directly for comparison, then information related to the definitive diagnosis is required, and the problem is that this imposes a huge burden both on the patient and on the health care workers.

Therefore, in this paper, we propose a methodology for the comparison of two binary diagnostic tests (referred to hereinafter as ‘’new test” and ‘’standard test”) in the absence of a definitive diagnosis, and discuss the follow-up design by using the said methodology. The characteristic of this method is that each patient was twice subjected to the new test and the standard test, respectively, both for a short period and focus was given to findings in which discordant results were obtained from the new and standard tests. This research paper comprises the following: Section 2 summarizes the criteria considered while comparing the two diagnostic tests; Section 3, we propose the methodology; Section 4, numerical simulations are performed using several scenarios; Section 5, a discussion is provided.

Comparison measure

T_N,T_S ∈{0,1} are random variables representing the results of the diagnosis according to the two binary diagnostic tests, namely the new test and the standard test. Murotani et al. [5] previously summarized the criteria for comparing the standard test and the new test as (C1), (C2), (C3), (C4) as follows:

(C1) Pr(T_N=1 |D=1) > Pr(T_S=1 |D=1) and Pr(T_N=0 |D=0) > Pr(T_S=0 |D=0),

(C2) Pr(D=1 |T_N=1) > Pr(D=1 |T_S=1) and Pr(D=0 |T_N=1) > Pr(D=0 |T_S=0),

(C3) Pr(T_N=1 |D=1) + Pr(T_N=1 |D=1) > Pr(T_S=1 |D=1) + Pr(T_S=0 |D=0), and

$(C4) \frac{\Pr (T_{N} = 1 | D=1) Pr (T_{N} = 0 | D=0)}{\Pr (T_{N} = 0 | D=1) Pr (T_{N} = 1 | D=0)} >$ $\frac{\Pr (T_{S} = 1 | D=1) Pr (T_{S} = 0 | D=0)}{\Pr (T_{S} = 0 | D=1) Pr (T_{S} = 1 | D=0)} .$

(C1) is compared on the basis of the sensitivity and specificity. In (C1) and (C2), the conditions are reversed. In other words, comparison was made on the basis of the probability for the patients actual condition to be ‘’presence of disease” (‘’absence of disease”) when (C2) was diagnosed as positive (negative). Therefore, the diagnostic tests were compared on the basis of their capability to predict the diagnosis. (C3) was compared on the basis of the size of the sum of sensitivity and specificity. This is the equivalent to selecting a diagnostic test with a large Area under the Curve (AUC). (C4) was compared on the basis of the odds ratio of the new test and standard test.

The meanings of the (C4) criteria were as follows: When T_N was T_N=1, the predictive capacity was expressed as follows:

O₁=Pr(D=1 |T_N=1)/Pr(D=0|T_N=1).

When T_N was T_N=0, the predictive capacity was expressed as follows:

O₂=Pr(D=0 |T_N=0)/Pr(D=1|T_N=0).

The larger the predictive value of T_N=1, the greater the value of O₁. The larger the predictive value of T_N=0, the greater the value of O₂; in other words, the lower the value of O₂ ^-1. Therefore, the ratio of the two (O₁/O₂) expresses the strength of the relationship between the new test and D. Higher values of the ratio would indicate that the new test is a good diagnostic test. Similarly, the standard test was also defined by the odds ratio, and the (C4) of the new test was compared with that of the standard test on the basis of the meaning of the odds ratio. In this paper, the diagnostic tests were compared on the basis of the meaning of (C4).

The parameters summarizing the (C4) criteria are defined by the following equation:

$ψ= \frac{\Pr (T_{N} = 1 | D=1) Pr (T_{N} = 0 | D=0)}{\Pr (T_{N} = 0 | D=1) Pr (T_{N} = 1 | D=0)} /$ $\frac{\Pr (T_{S} = 1 | D=1) Pr (T_{S} = 0 | D=0)}{\Pr (T_{S} = 0 | D=1) Pr (T_{S} = 1 | D=0)} .$

According to the (C4) criteria, the following interpretations can be made, depending on the value of ψ: {ψ >1 if T_N is superior to T_S; ψ =1 if T_N and T_S are equal; ψ <1 is inferior to T_S.

Thus, ψ is a criterion for the comparison of the two diagnostic tests.

If ψ can be estimated on the basis of the data, then the two binary diagnostic tests can be compared on the basis of the estimated value. In addition, if the distribution associated with the estimator of ψ can be calculated, then a hypothesis testing pertaining to ψ as well as the estimation of the confidence interval can also be conducted, and a follow-up design for the comparison of two binary diagnostic tests, including the planning of the number of cases, can be proposed. In the absence of definitive diagnosis (in the absence of observation of D), and on the basis of the data obtained by application of the new test and the standard test twice on each patient, the estimate of ψ and its asymptotic distribution were calculated under several assumptions. From the next section, we discuss the methodology in concrete terms.

Methodology

Notation and definition

{T_Nij(T_Sij), j=1,2,…,n}was a random variable representing the diagnostic results of the new test (standard test) that the i patient underwent for the jth time; {Di,i=1,2,…,n} was a random variable representing the actual status of the ith individual’s disease. This implies that Di does not depend on j, but the actual status of the disease remained unchanged at the time of the first and second application of the new test and the standard test. This can be ensured by applying the two diagnostic tests in a relatively short period, during which the actual condition of the disease remains unchanged. D_i is a non-observed random variable. T_Nij,T_Sij, and Di are binary random variables in which 1 means positive (disease) and 0 means negative (not disease). In addition, it was assumed that p=Pr(D_i=1) for all i.

The value p represents the prevalence rate. If {eNij,eSij} are considered as instances of TNij,TSij, the data obtained from the application of the new test and standard test twice to n patients without definitive diagnosis are expressed as $(ε_{N i 1}, ε_{S i 1}, ε_{N i 2}, ε_{S i 2}), i = 1, 2, ..., n .$

The cell probability p_ikl was ik $p_{ik l} = \Pr (T_{Nij} = {k,T}_{Sij} = ℓ), k, ℓ \in {0,1}$ . In addition, regarding pikl, if the actual condition of the disease is known, then $q_{Dk l i} = \Pr (T_{Nij} = {k,T}_{Sij} = ℓ | D=1)$

and $q_{\bar{D} kli} = \Pr (T_{Nij} {=k,T}_{Sij} = l |D=0)$ for i,j,k and l. $p_{ik l}, q_{Dk l i}, q_{\bar{D} k l i}$ were independent of j, but this meant that the cell probability remained unchanged in both the first and the second diagnostic results.

Design based approach

In this section, we consider the probability distribution on the basis of the method of extraction of individuals and to construct the likelihood. The new and standard tests, respectively, were applied twice on the i^th patient, and therefore, the j^th j=1,2 diagnostic results can be summarized in 2×2 contingency tables. When the twodimensional random variable representing the diagnostic results obtained at the time when the new and standard tests were applied on the i_th patient (T_Ni1,T_Si1), and the second diagnostic results of the new and standard tests (T_Ni2,T_Si2) follow a mutually independent multinomial distribution, the likelihood for the i_th patient can be expressed in the following equation:

$P_{i00}^{(1 - ε_{N i 1}) (1 - ε_{S i 1})} P_{i01}^{(1 - ε_{N i 1}) ε_{S i 1}} P_{i10}^{ε_{N i 1} (1 - ε_{S i 1})} P_{i11}^{ε_{N i 1} ε_{S i 1}}$

$\times P_{i00}^{(1 - ε_{N i 2}) (1 - ε_{S i 2})} P_{i01}^{(1 - ε_{N i 2}) ε_{S i 2}} P_{i10}^{ε_{N i 2} (1 - ε_{S i 2})} P_{i11}^{ε_{N i 2} ε_{S i 2}} .$

In addition, because the actual status of the disease is unknown, the cell probability p_ikl will be the mixture probability of the mixing ratio p, as represented by $p_{ik l} = {pq}_{Dk l i} + (1 - p) q_{\bar{D} k l i}$ In summary, the overall likelihood function (L) of n patients is provided as follows:

$L= \prod_{i = 1}^{n} {{(p q_{D 00 i} + (1 - p) q_{\bar{D} 00 i})}^{\sum_{j = 1}^{2} (1 - ε_{N i j}) (1 - ε_{S i j})}}$ ${{(p q_{D 01 i} + (1 - p) q_{\bar{D} 01 i})}^{\sum_{j = 1}^{2} (1 - ε_{N i j}) (1 - ε_{S i j})}}$

$\times {{(p q_{D 10 i} + (1 - p) q_{\bar{D} 10 i})}^{\sum_{j = 1}^{2} ε_{N i j} (1 - ε_{S i j})}}$ ${{(p q_{D 11 i} + (1 - p) q_{\bar{D} 11 i})}^{\sum_{j = 1}^{2} ε_{N i j} ε_{S i j})}} .$

Here q_D10i q_D01i/q_D01i q_D10i, does not depend on {i,j} and the results of the new and standard test are mutually independent when conditioned with the actual disease status, ψ can be expressed by the following equation.

$ψ= \frac{q_{D 10 i} q_{\bar{D} 01 i}}{q_{D 01 i} q_{\bar{D} 10 i}} .$

When ψ is estimated based on the overall likelihood L, it is important to know whether L is an exponential family. If L is an exponential family, then it is sufficient estimated on the basis of the conditional likelihood of ψ when sufficient sample statistics on nuisance parameters other than ψ are conditioned? However, unfortunately, L is not an exponential family. Therefore, it is difficult to estimate the ψ.

Conditional approach

When the overall likelihood is constructed by assuming the multinomial distribution estimated on the basis of the design, the cell probability will be the mixture probabilities of the not diseased group and that of the diseased group where the prevalence is a mixing ratio. Thus, the overall likelihood was not an exponential family, and it was not possible to estimate ψ based on sufficient statistics. In this section, we limit the data to those used in the analysis, and propose a new approach composed of conditional likelihood functions.

First, we assume the following (E1):

(E1) The data, in which the results of the new test and standard test were consistent with each other, are not related to the comparison of diagnostic tests.

If (E1) is expressed in other words, it insists on the fact that at the time of the analysis, there is no need to take into consideration the data in which the new test and standard test produced the same results. Based on an assumption (E1), considerations are only given to the pairs of data in which the diagnostic results differed from each other (discordant pairs) in the new test and standard test. Therefore, the following sets of A, B₁ and B₂ are defined depending on the number of times the new test and standard test.

A={i: (T_Ni1,T_Si1, T_Ni2, T_Si2)= (0,1,0,1),(0,1,1,0),(1,0,0,1),(1,0,1,0)},

B₁={i: (T_Ni1,T_Si1, T_Ni2, T_Si2)= (0,1,1,0),(0,1,0,0),(1,0,1,1),(1,0,0,0)},

B₂={i: (T_Ni1,T_Si1, T_Ni2, T_Si2)= (1,1,0,1),(0,0,0,1),(1,1,1,0),(0,0,1,0)}.

“A” represented a set of individuals in whom the results of the new test and standard test differed from each other, both the first time and the second time they were conducted. B₁ (B₂) represents a set of individuals in whom the results of the ‘new test’ and ‘standard test’ differed from each other the first time (the second time) they were conducted.

For A∪B1∪B2, T*_ij' is defined by the following equation.

$T_{i j^{'}}^{*} = {\begin{matrix} \begin{matrix} 1 & {if(T}_{Nij^{'}} {,T}_{Sij^{'}})=(1,0) \end{matrix} \\ \begin{matrix} 0 & {if(T}_{Nij^{'}} {,T}_{Sij^{'}})=(0, 1) \end{matrix} \end{matrix}, j^{'}=1,2 .,$

where

$\Pr (T_{ij^{'}}^{*} = 1) = \frac{\Pr (T_{Nij^{'}} = 1, T_{Sij^{'}} = 0)}{\Pr (T_{Nij^{'}} = 1, T_{Sij^{'}} = 0) + \Pr (T_{Nij^{'}} = 0, T_{Sij^{'}} = 1)},$

$\Pr (T_{i j^{'}}^{*} = 1) = 1 - \Pr (T_{i j^{'}}^{*} = 0) . (1)$

For i∈A, the observed values of $T_{i 1}^{*}, T_{i 2}^{*} are (ε_{i 1}^{*}, ε_{i 2}^{*}) .$ In the same manner, for i∈B₁, the observed value of $T_{i 1}^{*} is ε_{i B_{1}}^{*}, for i \in B_{2}$ the observed value of $T_{i 1}^{*} is ε_{i B_{1}}^{*}, for i \in B_{2}$ $T_{i 2}^{*} is ε_{i B_{2}}^{*} .$ In addition, for the i^th individual, M_i is defined as M_i=2 for i∈A, and as M_i=1 for i∈B₁∈B₂. In addition, (A1), (A2), (A3) are assumed as follows:

$(A1)i \in {A,Pr(T}_{i 1}^{*} = ε_{i1}^{*} {,T}_{i2}^{*} = ε_{i2}^{*} | D_{i} = ε_{i})=$ $\prod_{j' = 1}^{2} \Pr (T_{i j'}^{*} = ε_{i j'}^{*} | D_{i} = ε_{i}),$

$(A2) α = {Pr(T}_{i j'}^{*} =1 | D_{i} {=1),β=Pr(T}_{i j'}^{*} =0 | D_{i} =0),$ $j' = 1,2,i \in A \cup B_{1} \cup B_{2,}$ And $(A3) (T_{i 1}^{*}, T_{i2}^{*}), i \in {A,T}_{{iB}_{1}}^{*}, i \in B_{1} {,T}_{{iB}_{2}}^{*}, i \in B_{2},$ are mutually independent.

(A1) assumes that for the i^th individual, $T_{i 1}^{*}, T_{i 2}^{*}$ are mutually independent under the actual status of the disease. Assumptions similar to this have previously been used by Hui and Walter [6] and Yanagawa and Kasagi [7], and are commonly known as conditional independence. Because this assumption is somewhat strong, Vacek [8] and Torrance-Rynard and Walter [9] have examined the effect of the divergence from the assumption on the estimation of the sensitivity and specificity.

(A2) assumes that from the perspective of $T_{i j^{'}}^{*}$ ' the sensitivity and specificity is constant, and does not depend on i or j. (A3) assumes that each individual is independent of the other individuals. The following important relationship exists between and the two parameters α and β.

$ψ = \frac{α β}{(1 - α) (1 - β)} (2)$

This relational equation shows that the conditional maximum likelihood estimator of ψ can be obtained if α and β, which maximize L_c are plugged in into the right side of (2). Under (A1), (A2) and (A3), the conditional likelihood function L_c is provided by the following equation (Appendix 1):

$L_{c} (p, α, β) = \prod_{i \in A} {(1 - p) {(1 - β)}^{ε_{i 1}^{*} + ε_{i 2}^{*}} β^{M_{i} - ε_{i 1}^{*} - ε_{i 2}^{*}} + p α^{ε_{i 1}^{*} + ε_{i 2}^{*}} {(1 - α)}^{M_{i} - ε_{i 1}^{*} - ε_{i 2}^{*}}}$

$\times \prod_{i \in B_{1}} {(1 - p) {(1 - β)}^{ε_{i B_{1}}^{*}} β^{M_{i} - ε_{i B_{1}}^{*}} + p α^{ε_{i B_{1}}^{*}} {(1 - α)}^{M_{i} - ε_{i B_{1}}^{*}}}$

$\times \prod_{i \in B_{2}} {(1 - p) {(1 - β)}^{ε_{i B_{2}}^{*}} β^{M_{i} - ε_{i B_{2}}^{*}} + p α^{ε_{i B_{2}}^{*}} {(1 - α)}^{M_{i} - ε_{i B_{2}}^{*}}} . (3)$

Asymptotic distribution

The α and β, which maximize the L_c are termed $\hat{α} and \hat{β} .$ . Under such circumstances, the plug-in estimator of ψ is provided by the following equation:

$\hat{ψ} = \frac{\hat{α} \hat{β}}{(1 - \hat{α}) (1 - \hat{β})} .$

$Var (\log \hat{ψ})$ is referred to as V_ψ. When actually calculated, the V_ψ is a asymptotically given by the following equation.

$V_{ψ} \approx \frac{1}{n} {{(\frac{1}{α} + \frac{1}{1 - α})}^{2} Var (\sqrt{n} (\hat{α} - α)) +$ ${(\frac{1}{β} + \frac{1}{1 - β})}^{2} Var (\sqrt{n} (\hat{β} - β))$

$+ 2 (\frac{1}{α} + \frac{1}{1 - α}) (\frac{1}{β} + \frac{1}{1 - β})$ $Cov (\sqrt{n} (\hat{α} - α), \sqrt{n} (\hat{β} - β))} .$

When the asymptotic normality of $\hat{α}, \hat{β}$ and the delta method are used $\log \hat{ψ} \to_{L} N (\log ψ, V_{ψ}),$ as n→∞ can be derived (Appendix 2), where →_L shows a convergence in law.

can be derived (Appendix 2), where →_L shows a convergence in law. Using an asymptotic distribution, the 95% confidence interval of ψ is given by the following equation:

$\exp (\log \hat{ψ} - 1.96 \sqrt{{\hat{V}}_{ψ}}) \leq ψ$ $\leq \exp (\log \hat{ψ} + 1.96 \sqrt{{\hat{V}}_{ψ}}) .$

Follow-up design

In the previous section, the estimator and asymptotic distribution of Ψ, which was used as an index for the comparison of two binary diagnostic tests, were calculated by focusing on the discordant pairs in the data obtained by applying diagnostic tests twice on patients without definite diagnosis. Here, we would like to describe the design of follow-up trial for the comparison of diagnostic tests using ψ as a primary endpoint. To design a trial, a known distribution of the primary endpoint is required.

The $\log \hat{ψ} follows log \hat{ψ} \sim N (\log ψ {,V}_{ψ})$ asymptotically, and the tested hypothesis is the following: H0: logψ=0 vs. H0: logψ≠0. This is the framework of a standard single-arm trial. If the values of logψ and V_ψ, and the level of significance and power are fixed, then the sample size needed for the detection of differences will be determined. However, because V_ψ is a quantity, which is difficult to understand intuitively, it can be predicted that V_ψ may be difficult to estimate during the design phase. To prevent this, we propose that the trial be started without determining V_ψ, and that V_ψ is estimated at a time when an n0 number of individuals have been accumulated after the beginning of the trial, and that the sample size needed for the detection of the differences be designed by using the estimate of variance. The order of the V_ψ can be evaluated according to the following equation:

$V_{ψ} = \frac{A}{n} + o_{p} (\frac{1}{n}), as n \to \infty,$

Where, A is a constant. After the beginning of the trial, an estimation of the variance is performed at a time when an n₀ number of individuals have accumulated, and the resulting value is termed V_ψ0. In such cases, the variance can be estimated according to the below equation, at a time when an n₁ number of cases have been accumulated for an arbitrary n₁>n₀.

$V_{ψ_{1}} \approx \frac{n_{0}}{n_{1}} V_{ψ_{0}}$

Based on the above, when considering logψ₁ as the difference to detect, Z_k as the upper-tail percentage points for the standard normal distribution, a as the level of significance, and 1 - b as the power, the sample size (n₁) needed for the detection of the difference with a probability higher than 1 - b can be designed according to the following equation,

$n_{1} = \frac{{(Z_{a / 2} + Z_{b})}^{2} V_{ψ_{1}}}{{(\log ψ_{1})}^{2}} .$

Using the approximation of $V_{ψ_{1}} \approx (n_{0} / n_{1}) V_{ψ_{0}},$ we obtain the following equation,

$n_{1} = \frac{(Z_{a / 2} + Z_{b}) \sqrt{n_{0} V_{ψ_{0}}}}{| \log ψ_{1} |} .$

Simulation

Several concrete situations are designed, and the behavior of the $\log \hat{ψ}$ according to the proposed method was examined numerically. Pr(T_N,T_S|D=1) and Pr(T_N,T_S|D=0) as well as the prevalence p=Pr(D=1) were put. Here, pattern 1 to pattern 4 was taken into account (Table1).

The differences between the patterns depended on 4 combinations involving whether the prevalence was high (low), and whether the new test was better (worse) than the standard test. In pattern 1, the prevalence was low (p=0.05), and the new test inferior to the standard test (logψ < 0). In pattern 2, the prevalence was high (p=0.2), and the new test inferior to the standard test (logψ < 0). In pattern 3, the prevalence was low (p=0.05), and the new test superior to the standard test (logψ > 0). In pattern 4, the prevalence was high (p=0.2), and the new test superior to the standard test (logψ > 0). The true values of α,β and ψ were calculated based on (1), (2), and the true conditional probability established in Table 1. In pattern 1, for example, α=0.2/ (0.2+0.15)=0.57, β=0.1/(0.1+0.2)=0.67, ψ=(0.57×0.333)/(1-0.57)×(1- 0.33)=0.67, logψ=log(0.67)=-0.41. The true values of α,β,ψ and logψ in other patterns are summarized in Table 2.

Table 1: Combination of the true probability of occurrence and true prevalence.




  
    
    
    Pr((TN,Ts)ID = 1) 
    Pr((TN,Ts)ID = 0) 
  
  
    Pattern 
    p
    (0,0)
    (0,1)
    (1,0)
    (1,1)
    (0,0)
    (0,1)
    (1,0)
    (1,1)
  
  
    1
    0.05
    0.1
    0.15
    0.2
    0.55
    0.6
    0.1
    0.2
    0.1
  
  
    2
    0.2
    0.1
    0.15
    0.2
    0.55
    0.6
    0.1
    0.2
    0.1
  
  
    3
    0.05
    0.1
    0.05
    0.15
    0.7
    0.5
    0.2
    0.1
    0.2
  
  
    4
    0.2
    0.1
    0.05
    0.15
    0.7
    0.5
    0.2
    0.1
    0.2



Table 1:  Combination of the true probability of occurrence and true prevalence.

Table 2: The true values of α, β, ψ and Log ψ



Table 2:  The true values of α, β, ψ and Log ψ.

For each pattern, data composed of random numbers ${(ε_{N i 1}, ε_{S i 1}, ε_{N i 2}, ε_{S i 2}); i = 1, 2, ..., n}$ were generated, a set consisting of A, B₁ and B₂ was formed, and data sets consisting exclusively of discordant pairs were generated. Next, $\hat{p}, \hat{α} and \hat{β}$ maximizing the likelihood (3) were calculated; the estimate ψ was calculated on the basis of $\hat{ψ} = \hat{α} \hat{β} / (1 - \hat{α}) (1 - \hat{β})$ and logψ was calculated. The calculation was repeated 1,000 times, and the sample mean of the estimates of logψ, Standard Error (SE), bias and Mean Squared Error (MSE) were calculated. A bias was defined as a subtraction of the true value from the sample mean. In other words, if the bias had a positive value, it showed an overestimate, and if it had a negative value, then it showed an underestimate. The sample size extracted at the beginning was set to n= 500, 1000, 2000, 5000, and 10,000 (Note that this is not the number of discordant pairs). All calculations were performed using the statistical software R (Ver. 3.1.1). The results were as follows.

In patterns 1 and 2, the new test was bad (true log=- 0.41) the prevalence p was p=0.05 in pattern 1 and p = 0.2 in pattern 2. The prevalence was the only parameter that showed a difference between both the patterns. The results of the estimations are summarized in Table 3. Even when $n$ is increased, the bias is not stable in pattern 1. Except for n=2000, a slight tendency to overestimate was found. On the other hand, the bias in pattern 2 is more unstable than that of pattern 1. MSE was lower in pattern 2 than in pattern 1, and estimations showing better accuracy at high prevalence were conducted. Next, we show the results of pattern 3 and pattern 4.

Table 3: Results of the simulation of pattern I and pattern 2.




  
    
    Pattern 1    (p= 0.05) 
    Pattern 2    (p= 0.20) 
  
  
    n
    mean
    s. e. 
    bias
    MSE 
    mean
    s.e.
    bias
    MSE
  
  
    500
    -0.493 
    0.022
    -0.088
    1.094
    -0.462 
    0.019 
    -0.056
    11808 
  
  
    1000 
    -509
    0.021
    -0.103 
    0.997
    -0.392
    0.018
    1014 
    0.7
  
  
    2000
    -0.384
    0.021
    0.021
    0.949
    -0.421 
    0.016
    -0.016
    0.586
  
  
    5000
    -11409 
    0.019
    -0.003 
    0.78 
    -0.342
    0.015 
    0.063
    0.48
  
  
    10000
    -0.427 
    0.017
    -0.021 
    0.67 
    -0.348
    0.013
    0.057
    395



Table 3:  Results of the simulation of pattern I and pattern 2.

For pattern 3 and pattern 4, the new test was superior to the standard test (true log=1.79); and the prevalence p was p=0.05 in pattern 3 and p = 0.2 in pattern 4. The accuracy was higher with pattern 4 than with pattern 3 (i.e. high prevalence leads to the reduction of S.E.). In addition, for both patterns 3 and 4, an increase in sample size was accompanied by a tendency to averagely underestimate logψ. The numerical results from Tables 3 and 4 are summarized in Figure 1. The error bars in the figure show the 95% confidence interval for the mean, and the dotted line represents the true value of logψ. The lower half corresponds to patterns 1 and 2, and the upper half corresponds to patterns 3 and 4. Triangles show values in case of p=0.05; circles show values in case of p=0.2 (Figure 1).

Table 4: Results of the simulation of pattern 3 and pattern 4.




  
    n
    Pattern 3    (p = 0.05) 
    Pattern 4    (p=0.20) 
  
  
    mean
    s. e. 
    bias
    MSE 
    mean
    s.e. 
    bias
    MSE
  
  
    500
    2.312
    0.078
    0.52
    0.437
    1.824 
    0.036
    0.033 
    0.037
  
  
    1000
    1.993 
    0.042
    0.201
    0.088
    1.68
    0.024
    -0.112
    0.028
  
  
    2000
    1.659
    0.026
    -0.133
    0.037
    1.571
    0.017
    -0.22
    0.056
  
  
    5000
    1.567
    0.02
    -0.225
    0.062
    1.515 
    0.011
    -0.277
    0.08
  
  
    10000
    1320
    0.016
    -0.272 
    0.081
    131.4
    11008 
    -0.278
    0.079



Table 4:  Results of the simulation of pattern 3 and pattern 4.

Figure 5: Sample mean and 95%confidence interval of the estimated values of log in each Pattern.

    
    
    Figure 5:  Sample mean and 95%confidence interval of the estimated values
of log in each Pattern.

Discussion

In this paper, we propose a parameter ψ for the comparison of diagnostic tests on the basis of data obtained from the application of each binary diagnostic test twice in patients with no definitive diagnosis. The asymptotic distribution of logψ was also calculated on the basis of conditions in which the data were limited to discordant pairs; further, the method for designing the sample size was also discussed. The influence of the restricted focus on discordant pairs on the estimation results is probably an issue that will need to be evaluated in the future. Comparisons with the estimation of logψ from the overall likelihood can be conducted, but when the estimation is based on the overall likelihood, then the number of parameters increases and application of the diagnostic tests twice in each individual does not allow for a sufficient degree of freedom and makes it impossible to conduct simultaneous estimation of all parameters. In this way, for estimating all parameters, the necessary for application of the diagnostic tests for estimating all parameters between the proposed method and the overall likelihood based method is different. Therefore, a comparison between two approaches is complicated.

The results of the numerical simulation showed an average tendency to underestimate when the true value of logψ was positive. The fact that logψ was positive implied that the conditions were more excellent with the new test than with the standard test. From a researchers’ perspective, trials can be carried out with certitude that the new test is a better diagnostic test than the standard test. This is believed to pose no particularly major problem because even if the new test is actually good, it can be interpreted as comparing conditions in a conservative manner. However, the theoretical reasons for underestimating need to be further evaluated. When the simulation results were discussed on the basis of the relationship with prevalence, the estimations were more highly accurate when the prevalence was high than when it was low. When the prevalence was high, individuals with D =1 were potentially included in large numbers. For such individuals, the accuracy of the estimation of parameters (α) conditioned at D =1 was higher, and as a result, the accuracy of logψ was considered to improve.

Our methodology allows designing the necessary number of cases at a time when n₀ individuals have been accumulated after the start of the trial. In such cases, the problematic issue comprises ‘’what the value of n₀ should be in order to be considered sufficient;” but the results of numerical simulations have shown that even in the worst case (pattern 3 and n=500), the SE of logψ was about 0.078. Therefore, the evaluation of dispersion might be good if performed at n₀=500.

This study was conceived exclusively for patients without a definitive diagnosis; however, after the start of the trial, we expected that while the trial was underway, the definitive diagnosis of some individuals might be determined. With the current methodology, there is no other choice but to conduct analyses by treating such individuals in the same manner as those whose definitive diagnosis has not yet been determined are treated. However, it is also beneficial to estimate information pertaining to the definitive diagnosis in mid-course of the trial and to develop a methodology allowing for estimation that is more accurate. This issue will be the topic of another paper.

References

Download PDF

Citation: Murotani K, Hirakawa A, Aoyama Y and Yanagawa T. Follow-up Design for Comparing Two Binary Diagnostic Tests. Austin Biom and Biostat. 2015;2(2): 1016. ISSN: 2378-9840

Instruction for Authors

Submit Your Article