Follow-up Design for Comparing Two Binary Diagnostic Tests

Research Article

Austin Biom and Biostat. 2015;2(2): 1016.

# Follow-up Design for Comparing Two Binary Diagnostic Tests

Kenta Murotani¹*, Akihiro Hirakawa², Yoshiko Aoyama³ and Takashi Yanagawa³

¹Center for Clinical Research, Aichi Medical University, Japan

²Center for Advanced Medicine and Clinical Research, Nagoya University Hospital, Japan

³The Biostatistics Center, Kurume University, Japan

*Corresponding author: Kenta Murotani, Center for Clinical Research, Aichi Medical University, 1-1 Yazakokarimata, Nagakute, Aichi, Japan.

Received: January 06, 2015; Accepted: May 11, 2015; Published: May 26, 2015

## Abstract

Most conventional methods of comparing two diagnostic tests require patients whose true disease statuses are known. We deal with in this paper a problem of comparing two binary diagnostic tests (referred to as new and standard tests) in a follow-up design, where there are no gold standards. Assume that each patient is examined twice by new and standard tests, respectively. We employed a comparison measure ψ, which is compared on the basis of the odds ratio of the new and standard test. It is not possible to estimate ψ from the full likelihood function based on the design, even if two independent multinomial distributions are assumed to the data. Therefore, we focus only on data from discordant pairs between new and standard tests. We construct conditional likelihood conditioned on those pairs and estimate parameters involved in the conditional likelihood. An estimate of ψ is obtained by plugging those estimates in ψ. The asymptotic normality of the estimator of ψ is shown based on delta method and a confidence interval of ψ is developed. A method of sample size determination for this design is also proposed. Simulation is conducted to study the behavior of the proposed method by considering several scenarios.

Keywords: Follow-up design; Diagnostic test; Comparison; No gold standard

## Introduction

Accurate diagnosis of the patient is crucial when planning the treatment of a disease. After determining the accurate diagnosis has been determined, the patient can begin receiving adequate treatment. An accurate evaluation and selection of the diagnostic method plays an important role in the patients’ health. A medical method that aims at determining whether a patient is affected by a disease is called a ‘’diagnostic test”. Particularly, diagnostic tests that evaluate the strength of suspicion of certain diseases on binary (‘’positive” and ‘’negative”) are called ‘’binary diagnostic tests”. To determine which of the two binary diagnostic tests is statistically better, the sensitivity and specificity must be closely examined [1,2]. Sensitivity and specificity are defined by the following equation: Sensitivity = Pr (T=1|D=1), Specificity = Pr (T=0|D=0), Where T indicates the diagnostic results according to the binary diagnostic test, and D indicates the actual condition of the disease. The T (D)=1 indicates positivity (disease), and 0 indicates negativity (not disease). Sensitivity is the conditional probability for patients who are actually disease to be diagnosed as positive, and specificity is the conditional probability for patients who are not actually disease to be diagnosed as negative. In both cases, values closer to 1 mean that the diagnostic test is accurate. If an observation of each patient’s actual disease condition (D) is conducted, the sensitivity and specificity can be estimated simply by calculating the proportion.

However, an accurate observation of the value of D involves methods that are often invasive for the patient. In the case of cancers, for example, the value of D can be assessed only by collecting cell samples through biopsy or surgery, and by determining the diagnosis in a comprehensive manner by using pathological and histological methods.

An example of an actual test is that of Berg et al. [3], who performed biopsy in patients with elevated risks of breast cancer for the determination of a definitive diagnosis, to examine whether ‘’mammography alone” and ‘’mammography combined with ultrasound” was effective as diagnostic tests for breast cancer detection. Similarly, in Japan, a large-scale randomized controlled trial of breast cancer screening methods (mammography alone vs. mammography combined with ultrasound) is being conducted on 100,000 women in their 40s [4]. In this study, the definitive diagnosis was determined on the basis of biopsy or surgery for patients whose overall screening results indicated a need for thorough examination. These examples involve two important issues. In other words, when sensitivity and specificity are evaluated directly for comparison, then information related to the definitive diagnosis is required, and the problem is that this imposes a huge burden both on the patient and on the health care workers.

Therefore, in this paper, we propose a methodology for the comparison of two binary diagnostic tests (referred to hereinafter as ‘’new test” and ‘’standard test”) in the absence of a definitive diagnosis, and discuss the follow-up design by using the said methodology. The characteristic of this method is that each patient was twice subjected to the new test and the standard test, respectively, both for a short period and focus was given to findings in which discordant results were obtained from the new and standard tests. This research paper comprises the following: Section 2 summarizes the criteria considered while comparing the two diagnostic tests; Section 3, we propose the methodology; Section 4, numerical simulations are performed using several scenarios; Section 5, a discussion is provided.

## Comparison measure

TN,TS ∈{0,1} are random variables representing the results of the diagnosis according to the two binary diagnostic tests, namely the new test and the standard test. Murotani et al. [5] previously summarized the criteria for comparing the standard test and the new test as (C1), (C2), (C3), (C4) as follows:

(C1) Pr(TN=1 |D=1) > Pr(TS=1 |D=1) and Pr(TN=0 |D=0) > Pr(TS=0 |D=0),

(C2) Pr(D=1 |TN=1) > Pr(D=1 |TS=1) and Pr(D=0 |TN=1) > Pr(D=0 |TS=0),

(C3) Pr(TN=1 |D=1) + Pr(TN=1 |D=1) > Pr(TS=1 |D=1) + Pr(TS=0 |D=0), and

$\left(\text{C4}\right)\frac{\mathrm{Pr}\left({\text{T}}_{\text{N}}=1|\text{D=1}\right)\text{Pr}\left({\text{T}}_{\text{N}}=0|\text{D=0}\right)}{\mathrm{Pr}\left({\text{T}}_{\text{N}}=0|\text{D=1}\right)\text{Pr}\left({\text{T}}_{\text{N}}=1|\text{D=0}\right)}>$$\frac{\mathrm{Pr}\left({\text{T}}_{\text{S}}=1|\text{D=1}\right)\text{Pr}\left({\text{T}}_{\text{S}}=0|\text{D=0}\right)}{\mathrm{Pr}\left({\text{T}}_{\text{S}}=0|\text{D=1}\right)\text{Pr}\left({\text{T}}_{\text{S}}=1|\text{D=0}\right)}.$

(C1) is compared on the basis of the sensitivity and specificity. In (C1) and (C2), the conditions are reversed. In other words, comparison was made on the basis of the probability for the patients actual condition to be ‘’presence of disease” (‘’absence of disease”) when (C2) was diagnosed as positive (negative). Therefore, the diagnostic tests were compared on the basis of their capability to predict the diagnosis. (C3) was compared on the basis of the size of the sum of sensitivity and specificity. This is the equivalent to selecting a diagnostic test with a large Area under the Curve (AUC). (C4) was compared on the basis of the odds ratio of the new test and standard test.

The meanings of the (C4) criteria were as follows: When TN was TN=1, the predictive capacity was expressed as follows:

O1=Pr(D=1 |TN=1)/Pr(D=0|TN=1).

When TN was TN=0, the predictive capacity was expressed as follows:

O2=Pr(D=0 |TN=0)/Pr(D=1|TN=0).

The larger the predictive value of TN=1, the greater the value of O1. The larger the predictive value of TN=0, the greater the value of O2; in other words, the lower the value of O2 -1. Therefore, the ratio of the two (O1/O2) expresses the strength of the relationship between the new test and D. Higher values of the ratio would indicate that the new test is a good diagnostic test. Similarly, the standard test was also defined by the odds ratio, and the (C4) of the new test was compared with that of the standard test on the basis of the meaning of the odds ratio. In this paper, the diagnostic tests were compared on the basis of the meaning of (C4).

The parameters summarizing the (C4) criteria are defined by the following equation:

$\text{ψ=}\frac{\mathrm{Pr}\left({\text{T}}_{\text{N}}=1|\text{D=1}\right)\text{Pr}\left({\text{T}}_{\text{N}}=0|\text{D=0}\right)}{\mathrm{Pr}\left({\text{T}}_{\text{N}}=0|\text{D=1}\right)\text{Pr}\left({\text{T}}_{\text{N}}=1|\text{D=0}\right)}/$$\frac{\mathrm{Pr}\left({\text{T}}_{\text{S}}=1|\text{D=1}\right)\text{Pr}\left({\text{T}}_{\text{S}}=0|\text{D=0}\right)}{\mathrm{Pr}\left({\text{T}}_{\text{S}}=0|\text{D=1}\right)\text{Pr}\left({\text{T}}_{\text{S}}=1|\text{D=0}\right)}.$

According to the (C4) criteria, the following interpretations can be made, depending on the value of ψ: {ψ >1 if TN is superior to TS; ψ =1 if TN and TS are equal; ψ <1 is inferior to TS.

Thus, ψ is a criterion for the comparison of the two diagnostic tests.

If ψ can be estimated on the basis of the data, then the two binary diagnostic tests can be compared on the basis of the estimated value. In addition, if the distribution associated with the estimator of ψ can be calculated, then a hypothesis testing pertaining to ψ as well as the estimation of the confidence interval can also be conducted, and a follow-up design for the comparison of two binary diagnostic tests, including the planning of the number of cases, can be proposed. In the absence of definitive diagnosis (in the absence of observation of D), and on the basis of the data obtained by application of the new test and the standard test twice on each patient, the estimate of ψ and its asymptotic distribution were calculated under several assumptions. From the next section, we discuss the methodology in concrete terms.

## Notation and definition

{TNij(TSij), j=1,2,…,n}was a random variable representing the diagnostic results of the new test (standard test) that the i patient underwent for the jth time; {Di,i=1,2,…,n} was a random variable representing the actual status of the ith individual’s disease. This implies that Di does not depend on j, but the actual status of the disease remained unchanged at the time of the first and second application of the new test and the standard test. This can be ensured by applying the two diagnostic tests in a relatively short period, during which the actual condition of the disease remains unchanged. Di is a non-observed random variable. TNij,TSij, and Di are binary random variables in which 1 means positive (disease) and 0 means negative (not disease). In addition, it was assumed that p=Pr(Di=1) for all i.

The value p represents the prevalence rate. If {eNij,eSij} are considered as instances of TNij,TSij, the data obtained from the application of the new test and standard test twice to n patients without definitive diagnosis are expressed as $\left({\epsilon }_{Ni1},{\epsilon }_{Si1},{\epsilon }_{Ni2},{\epsilon }_{Si2}\right),i=1,2,...,n.$

The cell probability pikl was ik ${\text{p}}_{\text{ik}l}=\mathrm{Pr}\left({\text{T}}_{\text{Nij}}={\text{k,T}}_{\text{Sij}}=\ell \right),\text{k,}\ell \in \text{{0,1}}$ . In addition, regarding pikl, if the actual condition of the disease is known, then ${\text{q}}_{\text{Dk}l\text{i}}=\mathrm{Pr}\left({\text{T}}_{\text{Nij}}={\text{k,T}}_{\text{Sij}}=\ell |\text{D=1}\right)$

and ${\text{q}}_{\overline{\text{D}}\text{kli}}=\mathrm{Pr}\left({\text{T}}_{\text{Nij}}{\text{=k,T}}_{\text{Sij}}\text{=}l\text{|D=0}\right)$ for i,j,k and l. ${\text{p}}_{\text{ik}l},{\text{q}}_{\text{Dk}l\text{i}},{\text{q}}_{\overline{\text{D}}\text{k}l\text{i}}$ were independent of j, but this meant that the cell probability remained unchanged in both the first and the second diagnostic results.

## Design based approach

In this section, we consider the probability distribution on the basis of the method of extraction of individuals and to construct the likelihood. The new and standard tests, respectively, were applied twice on the ith patient, and therefore, the jth j=1,2 diagnostic results can be summarized in 2×2 contingency tables. When the twodimensional random variable representing the diagnostic results obtained at the time when the new and standard tests were applied on the ith patient (TNi1,TSi1), and the second diagnostic results of the new and standard tests (TNi2,TSi2) follow a mutually independent multinomial distribution, the likelihood for the ith patient can be expressed in the following equation:

${\text{P}}_{\text{i00}}^{\left(1-{\epsilon }_{Ni1}\right)\left(1-{\epsilon }_{Si1}\right)}{\text{P}}_{\text{i01}}^{\left(1-{\epsilon }_{Ni1}\right){\epsilon }_{Si1}}{\text{P}}_{\text{i10}}^{{\epsilon }_{Ni1}\left(1-{\epsilon }_{Si1}\right)}{\text{P}}_{\text{i11}}^{{\epsilon }_{Ni1}{\epsilon }_{Si1}}$

$×{\text{P}}_{\text{i00}}^{\left(1-{\epsilon }_{Ni2}\right)\left(1-{\epsilon }_{Si2}\right)}{\text{P}}_{\text{i01}}^{\left(1-{\epsilon }_{Ni2}\right){\epsilon }_{Si2}}{\text{P}}_{\text{i10}}^{{\epsilon }_{Ni2}\left(1-{\epsilon }_{Si2}\right)}{\text{P}}_{\text{i11}}^{{\epsilon }_{Ni2}{\epsilon }_{Si2}}.$

In addition, because the actual status of the disease is unknown, the cell probability pikl will be the mixture probability of the mixing ratio p, as represented by ${\text{p}}_{\text{ik}l}={\text{pq}}_{\text{Dk}l\text{i}}+\left(1-\text{p}\right){\text{q}}_{\overline{\text{D}}\text{k}l\text{i}}$ In summary, the overall likelihood function (L) of n patients is provided as follows:

$\text{L=}\prod _{i=1}^{n}\left\{{\left(p{q}_{D00i}+\left(1-p\right){q}_{\overline{D}00i}\right)}^{{\sum }_{j=1}^{2}\left(1-{\epsilon }_{Nij}\right)\left(1-{\epsilon }_{Sij}\right)}\right\}$ $\left\{{\left(p{q}_{D01i}+\left(1-p\right){q}_{\overline{D}01i}\right)}^{{\sum }_{j=1}^{2}\left(1-{\epsilon }_{Nij}\right)\left(1-{\epsilon }_{Sij}\right)}\right\}$

$×\left\{{\left(p{q}_{D10i}+\left(1-p\right){q}_{\overline{D}10i}\right)}^{{\sum }_{j=1}^{2}{\epsilon }_{Nij}\left(1-{\epsilon }_{Sij}\right)}\right\}$$\left\{{\left(p{q}_{D11i}+\left(1-p\right){q}_{\overline{D}11i}\right)}^{{\sum }_{j=1}^{2}{\epsilon }_{Nij}{\epsilon }_{Sij}\right)}\right\}.$

Here qD10i qD01i/qD01i qD10i, does not depend on {i,j} and the results of the new and standard test are mutually independent when conditioned with the actual disease status, ψ can be expressed by the following equation.

$\text{ψ=}\frac{{q}_{D10i}{q}_{\overline{D}01i}}{{q}_{D01i}{q}_{\overline{D}10i}}.$

When ψ is estimated based on the overall likelihood L, it is important to know whether L is an exponential family. If L is an exponential family, then it is sufficient estimated on the basis of the conditional likelihood of ψ when sufficient sample statistics on nuisance parameters other than ψ are conditioned? However, unfortunately, L is not an exponential family. Therefore, it is difficult to estimate the ψ.

## Conditional approach

When the overall likelihood is constructed by assuming the multinomial distribution estimated on the basis of the design, the cell probability will be the mixture probabilities of the not diseased group and that of the diseased group where the prevalence is a mixing ratio. Thus, the overall likelihood was not an exponential family, and it was not possible to estimate ψ based on sufficient statistics. In this section, we limit the data to those used in the analysis, and propose a new approach composed of conditional likelihood functions.

First, we assume the following (E1):

(E1) The data, in which the results of the new test and standard test were consistent with each other, are not related to the comparison of diagnostic tests.

If (E1) is expressed in other words, it insists on the fact that at the time of the analysis, there is no need to take into consideration the data in which the new test and standard test produced the same results. Based on an assumption (E1), considerations are only given to the pairs of data in which the diagnostic results differed from each other (discordant pairs) in the new test and standard test. Therefore, the following sets of A, B1 and B2 are defined depending on the number of times the new test and standard test.

A={i: (TNi1,TSi1, TNi2, TSi2)= (0,1,0,1),(0,1,1,0),(1,0,0,1),(1,0,1,0)},

B1={i: (TNi1,TSi1, TNi2, TSi2)= (0,1,1,0),(0,1,0,0),(1,0,1,1),(1,0,0,0)},

B2={i: (TNi1,TSi1, TNi2, TSi2)= (1,1,0,1),(0,0,0,1),(1,1,1,0),(0,0,1,0)}.

“A” represented a set of individuals in whom the results of the new test and standard test differed from each other, both the first time and the second time they were conducted. B1 (B2) represents a set of individuals in whom the results of the ‘new test’ and ‘standard test’ differed from each other the first time (the second time) they were conducted.

For A∪B1∪B2, T* ij' is defined by the following equation.

${\text{T}}_{i{j}^{\prime }}^{*}=\left\{\begin{array}{c}\begin{array}{cc}1& {\text{if(T}}_{\text{Ni j ′}}{\text{,T}}_{\text{Si j ′}}\text{)=(1,0)}\end{array}\\ \begin{array}{cc}0& {\text{if(T}}_{\text{Ni j ′}}{\text{,T}}_{\text{Si j ′}}\text{)=(}0,1\right)\end{array}\end{array},\text{j ′ =1,2}\text{.,}$

where

$\mathrm{Pr}\left({\text{T}}_{\text{i j ′}}^{*}=1\right)=\frac{\mathrm{Pr}\left({\text{T}}_{\text{Ni j ′}}=1,{\text{T}}_{\text{Si j ′}}=0\right)}{\mathrm{Pr}\left({\text{T}}_{\text{Ni j ′}}=1,{\text{T}}_{\text{Si j ′}}=0\right)+\mathrm{Pr}\left({\text{T}}_{\text{Ni j ′}}=0,{\text{T}}_{\text{Si j ′}}=1\right)},$

For i∈A, the observed values of In the same manner, for i∈B1, the observed value of the observed value of In addition, for the ith individual, Mi is defined as Mi=2 for i∈A, and as Mi=1 for i∈B1∈B2. In addition, (A1), (A2), (A3) are assumed as follows:

$\text{(A1)i}\in {\text{A,Pr(T}}_{\text{i}1}^{*}\text{=}{\epsilon }_{\text{i1}}^{*}{\text{,T}}_{\text{i2}}^{*}\text{=}{\epsilon }_{\text{i2}}^{*}|{\text{D}}_{\text{i}}\text{=}{\epsilon }_{\text{i}}\text{)=}$$\prod _{j\text{'}=1}^{2}\mathrm{Pr}\left({\text{T}}_{ij\text{'}}^{*}={\epsilon }_{ij\text{'}}^{*}|{D}_{i}={\epsilon }_{i}\right),$

$\text{(A2)}\alpha ={\text{Pr(T}}_{ij\text{'}}^{*}\text{=1}|{\text{D}}_{\text{i}}{\text{=1),β=Pr(T}}_{ij\text{'}}^{*}\text{=0}|{\text{D}}_{\text{i}}\text{=0),}$$\text{j}\text{'}=\text{1,2,i}\in \text{A}\cup {\text{B}}_{1}\cup {\text{B}}_{2,}$ And$\left(\text{A3}\right)\left({\text{T}}_{\text{i}1}^{*},{\text{T}}_{\text{i2}}^{*}\right),\text{i}\in {\text{A,T}}_{{\text{iB}}_{\text{1}}}^{*},\text{i}\in {\text{B}}_{1}{\text{,T}}_{{\text{iB}}_{2}}^{*},\text{i}\in {\text{B}}_{2},$ are mutually independent.

(A1) assumes that for the ith individual, ${T}_{i1}^{*},{T}_{i2}^{*}$are mutually independent under the actual status of the disease. Assumptions similar to this have previously been used by Hui and Walter [6] and Yanagawa and Kasagi [7], and are commonly known as conditional independence. Because this assumption is somewhat strong, Vacek [8] and Torrance-Rynard and Walter [9] have examined the effect of the divergence from the assumption on the estimation of the sensitivity and specificity.

(A2) assumes that from the perspective of${T}_{i{j}^{\prime }}^{*}$ ' the sensitivity and specificity is constant, and does not depend on i or j. (A3) assumes that each individual is independent of the other individuals. The following important relationship exists between and the two parameters α and β.

This relational equation shows that the conditional maximum likelihood estimator of ψ can be obtained if α and β, which maximize Lc are plugged in into the right side of (2). Under (A1), (A2) and (A3), the conditional likelihood function Lc is provided by the following equation (Appendix 1):

${\text{L}}_{\text{c}}\left(p,\alpha ,\beta \right)=\prod _{i\in A}\left\{\left(1-p\right){\left(1-\beta \right)}^{{\epsilon }_{i1}^{*}+{\epsilon }_{i2}^{*}}{\beta }^{{M}_{i}-{\epsilon }_{i1}^{*}-{\epsilon }_{i2}^{*}}+p{\alpha }^{{\epsilon }_{i1}^{*}+{\epsilon }_{i2}^{*}}{\left(1-\alpha \right)}^{{M}_{i}-{\epsilon }_{i1}^{*}-{\epsilon }_{i2}^{*}}\right\}$

$×\prod _{i\in {B}_{1}}\left\{\left(1-p\right){\left(1-\beta \right)}^{{\epsilon }_{i{B}_{1}}^{*}}{\beta }^{{M}_{i}-{\epsilon }_{i{B}_{1}}^{*}}+p{\alpha }^{{\epsilon }_{i{B}_{1}}^{*}}{\left(1-\alpha \right)}^{{M}_{i}-{\epsilon }_{i{B}_{1}}^{*}}\right\}$

## Asymptotic distribution

The α and β, which maximize the Lc are termed . Under such circumstances, the plug-in estimator of ψ is provided by the following equation:

$\stackrel{^}{\psi }=\frac{\stackrel{^}{\alpha }\stackrel{^}{\beta }}{\left(1-\stackrel{^}{\alpha }\right)\left(1-\stackrel{^}{\beta }\right)}.$

$\text{Var}\left(\mathrm{log}\stackrel{^}{\psi }\right)$ is referred to as Vψ. When actually calculated, the Vψ is a asymptotically given by the following equation.

${\text{V}}_{\text{ψ}}\approx \frac{1}{n}\left\{{\left(\frac{1}{\alpha }+\frac{1}{1-\alpha }\right)}^{2}\text{Var}\left(\sqrt{n}\left(\stackrel{^}{\alpha }-\alpha \right)\right)+$${\left(\frac{1}{\beta }+\frac{1}{1-\beta }\right)}^{2}\text{Var}\left(\sqrt{n}\left(\stackrel{^}{\beta }-\beta \right)\right)$

$+2\left(\frac{1}{\alpha }+\frac{1}{1-\alpha }\right)\left(\frac{1}{\beta }+\frac{1}{1-\beta }\right)$$\text{Cov}\left(\sqrt{n}\left(\stackrel{^}{\alpha }-\alpha \right),\sqrt{n}\left(\stackrel{^}{\beta }-\beta \right)\right)\right\}.$

When the asymptotic normality of $\stackrel{^}{\alpha },\stackrel{^}{\beta }$ and the delta method are used$\mathrm{log}\stackrel{^}{\psi }{\to }_{\text{L}}\text{N}\left(\mathrm{log}\psi ,{\text{V}}_{\psi }\right),$ as n→∞ can be derived (Appendix 2), where →L shows a convergence in law.

can be derived (Appendix 2), where →L shows a convergence in law. Using an asymptotic distribution, the 95% confidence interval of ψ is given by the following equation:

$\mathrm{exp}\left(\mathrm{log}\stackrel{^}{\psi }-1.96\sqrt{{\stackrel{^}{\text{V}}}_{\psi }}\right)\le \psi$$\le \mathrm{exp}\left(\mathrm{log}\stackrel{^}{\psi }+1.96\sqrt{{\stackrel{^}{\text{V}}}_{\psi }}\right).$

## Follow-up design

In the previous section, the estimator and asymptotic distribution of Ψ, which was used as an index for the comparison of two binary diagnostic tests, were calculated by focusing on the discordant pairs in the data obtained by applying diagnostic tests twice on patients without definite diagnosis. Here, we would like to describe the design of follow-up trial for the comparison of diagnostic tests using ψ as a primary endpoint. To design a trial, a known distribution of the primary endpoint is required.

The asymptotically, and the tested hypothesis is the following: H0: logψ=0 vs. H0: logψ≠0. This is the framework of a standard single-arm trial. If the values of logψ and Vψ, and the level of significance and power are fixed, then the sample size needed for the detection of differences will be determined. However, because Vψ is a quantity, which is difficult to understand intuitively, it can be predicted that Vψ may be difficult to estimate during the design phase. To prevent this, we propose that the trial be started without determining Vψ, and that Vψ is estimated at a time when an n0 number of individuals have been accumulated after the beginning of the trial, and that the sample size needed for the detection of the differences be designed by using the estimate of variance. The order of the Vψ can be evaluated according to the following equation:

Where, A is a constant. After the beginning of the trial, an estimation of the variance is performed at a time when an n0 number of individuals have accumulated, and the resulting value is termed Vψ0. In such cases, the variance can be estimated according to the below equation, at a time when an n1 number of cases have been accumulated for an arbitrary n1>n0.

${\text{V}}_{{\psi }_{\text{1}}}\approx \frac{{n}_{0}}{{n}_{1}}{\text{V}}_{{\psi }_{\text{0}}}$

Based on the above, when considering logψ1 as the difference to detect, Zk as the upper-tail percentage points for the standard normal distribution, a as the level of significance, and 1 - b as the power, the sample size (n1) needed for the detection of the difference with a probability higher than 1 - b can be designed according to the following equation,

${\text{n}}_{1}=\frac{{\left({Z}_{a/2}+{Z}_{b}\right)}^{2}{\text{V}}_{{\psi }_{1}}}{{\left(\mathrm{log}{\psi }_{1}\right)}^{2}}.$

Using the approximation of ${\text{V}}_{{\psi }_{\text{1}}}\approx \left({n}_{\text{0}}\text{/}{n}_{\text{1}}\right){\text{V}}_{{\psi }_{\text{0}}},$ we obtain the following equation,

${\text{n}}_{1}=\frac{\left({Z}_{a/2}+{Z}_{b}\right)\sqrt{{\text{n}}_{0}{\text{V}}_{{\psi }_{0}}}}{|\mathrm{log}{\psi }_{1}|}.$

## Simulation

Several concrete situations are designed, and the behavior of the $\mathrm{log}\stackrel{^}{\psi }$ according to the proposed method was examined numerically. Pr(TN,TS|D=1) and Pr(TN,TS|D=0) as well as the prevalence p=Pr(D=1) were put. Here, pattern 1 to pattern 4 was taken into account (Table1).

The differences between the patterns depended on 4 combinations involving whether the prevalence was high (low), and whether the new test was better (worse) than the standard test. In pattern 1, the prevalence was low (p=0.05), and the new test inferior to the standard test (logψ < 0). In pattern 2, the prevalence was high (p=0.2), and the new test inferior to the standard test (logψ < 0). In pattern 3, the prevalence was low (p=0.05), and the new test superior to the standard test (logψ > 0). In pattern 4, the prevalence was high (p=0.2), and the new test superior to the standard test (logψ > 0). The true values of α,β and ψ were calculated based on (1), (2), and the true conditional probability established in Table 1. In pattern 1, for example, α=0.2/ (0.2+0.15)=0.57, β=0.1/(0.1+0.2)=0.67, ψ=(0.57×0.333)/(1-0.57)×(1- 0.33)=0.67, logψ=log(0.67)=-0.41. The true values of α,β,ψ and logψ in other patterns are summarized in Table 2.