Austin Biom and Biostat. 2015;2(3): 1024.
Abstract
Optimal sample sizes for comparison of two groups with financial constraints
are discussed in this paper. We study two types of optimal sample sizes under
the financial constraints: (a) minimize the variance of the difference and ratio
of two independent binary data under financial constraint, (b) maximize power
for detecting the difference of two proportions, two survival rates and two
correlations with financial constraint.
Keywords: Sample size; Power; Lagrange method; Financial constraint
Introduction
In the designing of medical studies we often face to decide the
optimal sample sizes for interventions and controls. It has been
decades since Cochran [1] studied the optimal sample size allocation
under different sampling schemes. Allison et al. [2] have considered
power, sample size and financial efficiency simultaneously. Guo et al.
[3] have studied the sample size allocation ratio by minimizing the
cost and maximizing the power. Guo and Luh [4] have also studied
sample size allocation of comparing two trimmed means under given
total cost. This is very important since nowadays investigators are
facing the funding cut for their studies. Therefore it is crucial to get
the optimal clinical trial results under financial cut and constraints.
The main focus of this paper is to discuss how to get optimal precision
for difference and ratio of two binary data and power for detecting the
difference of two proportions, two survival rates and two correlations
with financial constraints.
Minimal variance under financial constraints
Assume we have a clinical trial in which the sample sizes for
intervention and control are n1 and n2, respectively. We use p1 and
p2 to denote the proportions for binary responses for two groups,
respectively. For continuous data, we use µ1 and µ2 for the means. Let
p = p1 - p2 and R = p1/p2 and µ = µ1 – µ2.
We assume that and respectively. Under the independent assumption
and
Using delta method we can get
Brittain and Schlesselman [5] have discussed the minimal
solutions of (1) and (3) by finding the ratio of But in real
situation we often face the constraint of budget. Say the costs for each subject in intervention (group 1) and control (group 2) are C1 and C2,
respectively, and the total cost is C. Therefore, we have the following
constraint
n1C1 + n2C2 = C (4)
The optimal solution of (2) under (4) was given by Cochran [1]
with
or
Cochran obtained this result under the setting of optimum allocation
of double sampling, C1 and C2 were unit sampling costs, respectively,
and the structures of s1 and s2 were more complicated than here. Guo
et al. [3] have proved that (6) also attains optimal power for fixed
total cost.
The optimal solution of (1) under constraint (4) is
according to Lagrange multiplier theory [6]. Therefore
but
if there is no financial constraint (4) according to [5].
Similarly, (3) is minimized under constraint (4) when
which imply
C1 and C2 are extra terms compared to the corresponding result in [5].
Example: Now consider the design for an experiment of binary
data with p1=01, p2=0.05, C1=40, C2=10 and C=21750. To minimize
the variance of ?p we choose n1=399 and n1=579 according to our
formula (7). This sample size allocation will give us 84% power at
significant level of 0.05 and precision Var If we choose
n1=n2=435 then we can only get 80% power at level 0.05 and precision
Var
(
Maximal power with financial constraints
For two independent samples of continuous data with hypotheses
H0: µ1= µ2 vs H1: µ1≠ µ2 (11)
the critical point for the power can be written as
which is maximized when
is minimized. Therefore the test for hypotheses (11) reaches maximal
power under financial constraint (4) when the sample sizes are
allocated according to (5) (Guo et al. [3]). Namely the solution (5)
simultaneously minimizes the precision and maximizes the power.
The critical point for the power of hypotheses:
H0: p1=p2 vs H1: p1≠p2 (13)
for dichotomous data is given by Fleiss ([7])
with
and Zα and Zβ are the cut off points for type I and II errors, respectively,
in normal distribution. The optimal solution of (14) under the
constraint (4) has no closed form and we can only use iterative
algorithm to get it. The details can be found in [6]. But as sample size
n1+n2→∞ the solution of (14) with the constraint (4) is the same as (7).
Let us consider the survival analysis with two independent
samples. First assume we are going to follow the subjects until the
events. Then there is no censoring. For testing the hypotheses
H0: λ1=λ2 vs H1: λ1≠λ2 (15)
Pasternack and Gilbert have given the following formula ([8])
where
Again the optimal solution for (16) under constraint (4) has no closed
form and asymptotic solution as n1+n2→∞ is given by
Now we assume that there is censoring in the data since most
of time we cannot follow all subjects until the events. Under some
regular conditions we have
where λ is given by (17) and
(see [9] for details). Obviously (16) is a special case of (19) with
Φ(λ)=Λ2. We need to use the iterative Lagrange multiplier method to
get the optimal solution of (19) under (4). The asymptotic solution is
given by
In many applications it is important to detect possible difference
in correlations. Suppose we have two independent samples with
correlations r1 and r2, respectively. Our hypotheses are
H0: r1=r2 vs H1: r1≠r2 (21)
Then
(22)
where
(23)
according to Fisher’s arctanh transformation [10]. The hypotheses
reach maximal power under constraint (4) when
As an example we are going to prove (24) and show how to
use Lagrange multiplier theory to prove similar results. In fact, to
maximize the power we must maximize ZΒ in (22). Equivalently, we
only need to minimize
under constraint of (4). So the corresponding Lagrange multiplier
function is
and
imply
Plugging (27) in (4) and solving forλ, we get
Now we plug (28) in (27) and obtain (24). Since the Hessian matrix of
Q with respect to n1 and n2 is
which is positive definite, (24) is the minimum solution for (26).
Therefore it maximizes the power.
Other results can be proved similarly.
Example: Suppose p1=0.6, q1=0.4, p2=0.2, q2=0.8, C1=$400,
C2=$100, and C=$1000. If we want to test the difference at significant
level 0.05 and 80% power with equal sample size, then n1=n2=23. This
sample size allocation is going to give us the total cost of $1150, which
is over the budget. If we use n1=n2=20 our power will be 75%, which
is usually not acceptable. Now plug all the parameters in our formula
(7), then we get n1=17.75 and n2=28.99. Choosing n1=18 and n2=28
we will get power of 80% and the actual type I error is 0.043, which is
what we want.
Conclusion and Discussion
Cost constraints have important impact in the design of
experimental studies. We have studied optimal sample allocation to
achieve maximal precision and power under total financial constraints
for comparison of two samples. The results are easy to program and
therefore have broad applications. But we must point out that there
are limits, say, the problems have been simplified and we do not
consider recruitment and related costs. For rare disease, minimal
sample size for fixed power and false positive rate is more important
than fixed cost due to difficulty in recruiting. The applicability of the
results is quite obvious.
Acknowledgement
Thanks are given to Dr. James Anderson for reading the manuscripts and providing useful suggestions. We appreciate the
suggestive comments of an anonymous referee.
References
- Cochran W. Sampling Techniques, 3rd edn. Wiley, New York. 1977; 448.
- Allison DB, Allison RL, Faith MS, Paultre F, Pi-Sunyer FX. Power and Money:
Designing Statistically Powerful Studies While Minimizing Financial Costs,
Psychological Methods. 1997; 2: 20-33.
- Guo JH, Chen HJ, Luh WM. Sample size planning with the cost constraint for
testing superiority and equivalence of two independent groups. Br J Math Stat
Psychol. 2011; 64: 439-461.
- Guo JH, Luh WM. Optimum sample size allocation to minimize cost or
maximize power for the two-sample trimmed mean test. Br J Math Stat
Psychol. 2009; 62: 283-298.
- Brittain E, Schlesselman JJ. Optimal allocation for the comparison of
proportions. Biometrics. 1982; 38: 1003-1009.
- Bertsekas DP. Nonlinear Programming, Athena Scientific, Belmont, MA.
1999.
- Fleiss J. Statistical Methods for Rates and Proportions, Wiley, New York.
1973.
- Pasternack BS, Gilbert HS. Planning the duration of long-term survival time
studies designed for accrual by cohorts. J Chronic Dis. 1971; 24: 681-700.
- Gross A, Clark V. Survival Distributions: Reliability Applications in Biomedical
Science, Wiley, New York. 1975.
- Freiman JA, Chalmers TC, Smith H Jr, Kuebler RR. The importance of
beta, the type II error and sample size in the design and interpretation of the
randomized control trial. Survey of 71 “negative” trials. N Engl J Med. 1978;
299: 690-694.