Jiangao Luo; Yanpin Wang; Jane Meza

Research Article

Austin Biom and Biostat. 2015;2(3): 1024.

Optimal Sample Size Allocation under Financial Constraint

Jiangao Luo¹*, Yanpin Wang² and Jane Meza¹

¹Department of Biostatistics, University of Nebraska Medical Center, USA

²First National Bank, USA

*Corresponding author: Jiangao Luo, Department of Biostatistics, College of Public Health, University of Nebraska Medical Center, 984375 Nebraska Medicine, Emile and 42^nd St, Omaha, NE 68198-4375, USA

Received:March 09, 2015;Accepted:June 18, 2015;Published: July 07, 2015

Abstract

Optimal sample sizes for comparison of two groups with financial constraints are discussed in this paper. We study two types of optimal sample sizes under the financial constraints: (a) minimize the variance of the difference and ratio of two independent binary data under financial constraint, (b) maximize power for detecting the difference of two proportions, two survival rates and two correlations with financial constraint.

Keywords: Sample size; Power; Lagrange method; Financial constraint

Introduction

In the designing of medical studies we often face to decide the optimal sample sizes for interventions and controls. It has been decades since Cochran [1] studied the optimal sample size allocation under different sampling schemes. Allison et al. [2] have considered power, sample size and financial efficiency simultaneously. Guo et al. [3] have studied the sample size allocation ratio by minimizing the cost and maximizing the power. Guo and Luh [4] have also studied sample size allocation of comparing two trimmed means under given total cost. This is very important since nowadays investigators are facing the funding cut for their studies. Therefore it is crucial to get the optimal clinical trial results under financial cut and constraints. The main focus of this paper is to discuss how to get optimal precision for difference and ratio of two binary data and power for detecting the difference of two proportions, two survival rates and two correlations with financial constraints.

Minimal variance under financial constraints

Assume we have a clinical trial in which the sample sizes for intervention and control are n₁ and n₂, respectively. We use p₁ and p₂ to denote the proportions for binary responses for two groups, respectively. For continuous data, we use µ1 and µ2 for the means. Let p = p₁ - p₂ and R = p₁/p₂ and µ = µ₁ – µ₂.

We assume that $n {\hat{p}}_{1} \sim B i n (n_{1}, p_{1}), n {\hat{p}}_{2} \sim B i n (n_{2}, p_{2})$ and ${\hat{μ}}_{1} \sim N (μ_{1}, \frac{σ_{1}^{2}}{n_{1}}), {\hat{μ}}_{1} \sim N (μ_{1}, \frac{σ_{1}^{2}}{n_{1}}),$ respectively. Under the independent assumption

$V a r (\hat{p}) = \frac{p_{1} q_{1}}{n_{1}} + \frac{p_{2} q_{2}}{n_{2}} (1)$

and

$V a r (\hat{μ}) = \frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}} (2)$

Using delta method we can get

$V a r (\hat{R}) = {(\frac{p_{1}}{p_{2}})}^{2} (\frac{q_{1}}{n_{1} p_{1}} + \frac{q_{2}}{n_{2} p_{2}}) (3)$

Brittain and Schlesselman [5] have discussed the minimal solutions of (1) and (3) by finding the ratio of $\frac{n_{1}}{n_{1} + n_{2}}$ But in real situation we often face the constraint of budget. Say the costs for each subject in intervention (group 1) and control (group 2) are C1 and C2, respectively, and the total cost is C. Therefore, we have the following constraint

n₁C₁ + n₂C₂ = C (4)

The optimal solution of (2) under (4) was given by Cochran [1] with

$n_{1} = \frac{C σ_{1}}{\sqrt{C_{1}} (σ_{1} \sqrt{C_{1}} + σ_{2} \sqrt{C_{2}})}, n_{2} = \frac{C σ_{2}}{\sqrt{C_{2}} (σ_{2} \sqrt{C_{1}} + σ_{2} \sqrt{C_{2}})} (5)$

$n_{1} : n_{2} = σ_{1} \sqrt{C_{2}} : σ_{2} \sqrt{C_{1}} (6)$

Cochran obtained this result under the setting of optimum allocation of double sampling, C₁ and C₂ were unit sampling costs, respectively, and the structures of s₁ and s₂ were more complicated than here. Guo et al. [3] have proved that (6) also attains optimal power for fixed total cost.

The optimal solution of (1) under constraint (4) is

$n_{1} = \frac{C \sqrt{p_{1} q_{1}}}{\sqrt{C_{1} \sqrt{p_{1} q_{1} C_{1}} : \sqrt{p_{2} q_{2} C_{2}}}}, n_{2} \frac{C \sqrt{p_{2} q_{2}}}{\sqrt{C_{2} \sqrt{p_{1} q_{1} C_{1}} : \sqrt{p_{2} q_{2} C_{2}}}} (7)$

according to Lagrange multiplier theory [6]. Therefore

$n_{1} : n_{2} = \frac{p_{1} q_{1}}{\sqrt{C_{1}}} : \frac{\sqrt{p_{2} q_{2}}}{\sqrt{C_{2}}} = \sqrt{p_{1} q_{1} C_{2}} : \sqrt{p_{2} q_{2} C_{1}} (8)$

but

$n_{1} : n_{2} = \sqrt{p_{1} q_{1}} : \sqrt{p_{2} q_{2}}$

if there is no financial constraint (4) according to [5].

Similarly, (3) is minimized under constraint (4) when

$n_{1} = \frac{C}{C_{1}} \frac{\sqrt{\frac{q_{1} C_{1}}{p_{1}}}}{\sqrt{\frac{q_{1} C_{1}}{p_{1}}} + \sqrt{\frac{q_{2} C_{2}}{p_{2}}}}, n_{2} \frac{C}{C_{2}} \frac{\sqrt{\frac{q_{2} C_{2}}{p_{2}}}}{\sqrt{\frac{q_{1} C_{1}}{p_{1}}} + \sqrt{\frac{q_{2} C_{2}}{p_{2}}}} (9)$

which imply

$n_{1} : n_{2} = \sqrt{\frac{q_{1}}{p_{1} C_{1}}} : \sqrt{\frac{q_{2}}{p_{2} C_{2}}} = \sqrt{\frac{q_{1} C_{2}}{p_{1}}} : \sqrt{\frac{q_{2} C_{1}}{p_{2}}} (10)$

C₁ and C₂ are extra terms compared to the corresponding result in [5].

Example: Now consider the design for an experiment of binary data with p₁=01, p₂=0.05, C₁=40, C₂=10 and C=21750. To minimize the variance of ?p we choose n₁=399 and n₁=579 according to our formula (7). This sample size allocation will give us 84% power at significant level of 0.05 and precision Var $V a r (\hat{p}) = 0.000308.$ If we choose n₁=n₂=435 then we can only get 80% power at level 0.05 and precision Var $V a r (\hat{p}) = 0.000316.$ (

Maximal power with financial constraints

For two independent samples of continuous data with hypotheses

H0: µ₁= µ₂ vs H₁: µ₁≠ µ₂ (11)

the critical point for the power can be written as

$Z_{β} = \frac{| μ | - z_{α} \sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}}{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}} = \frac{| μ |}{\sqrt{\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}}} - Z_{α} (12)$

which is maximized when

$\frac{σ_{1}^{2}}{n_{1}} + \frac{σ_{2}^{2}}{n_{2}}$

is minimized. Therefore the test for hypotheses (11) reaches maximal power under financial constraint (4) when the sample sizes are allocated according to (5) (Guo et al. [3]). Namely the solution (5) simultaneously minimizes the precision and maximizes the power.

The critical point for the power of hypotheses:

H₀: p₁=p₂ vs H₁: p₁≠p₂ (13)

for dichotomous data is given by Fleiss ([7])

$Z_{β} = \frac{| p | - z_{α} \sqrt{(\frac{1}{n_{1}} + \frac{1}{n_{2}}) \bar{p} \bar{q}}}{\sqrt{\frac{p_{1} q_{1}}{n_{1}} + \frac{p_{2} q_{2}}{n_{2}}}} (14)$

with

$\bar{p} = \frac{n_{1} p_{1} + n_{2} p_{2}}{n_{1} + n_{2}}, \bar{q} = 1 - \bar{p}, p = p_{1} - p_{2}$

and Z_α and Z_β are the cut off points for type I and II errors, respectively, in normal distribution. The optimal solution of (14) under the constraint (4) has no closed form and we can only use iterative algorithm to get it. The details can be found in [6]. But as sample size n₁+n₂→∞ the solution of (14) with the constraint (4) is the same as (7).

Let us consider the survival analysis with two independent samples. First assume we are going to follow the subjects until the events. Then there is no censoring. For testing the hypotheses

H₀: λ₁=λ₂ vs H₁: λ₁≠λ₂ (15)

Pasternack and Gilbert have given the following formula ([8])

$Z_{β} = \frac{| λ_{1} - λ_{2} | - z_{α} \sqrt{(\frac{1}{n_{1}} + \frac{1}{n_{2}}) {\bar{λ}}^{2}}}{\sqrt{\frac{λ_{1}^{2}}{n_{1}} + \frac{λ_{2}^{2}}{n_{2}}}} (16)$

where

$\bar{λ} = \frac{n_{1} λ_{1} + n_{2} λ_{2}}{n_{1} + n_{2}} (17)$

Again the optimal solution for (16) under constraint (4) has no closed form and asymptotic solution as n₁+n₂→∞ is given by

$n_{1} = \frac{C λ_{1}}{\sqrt{C_{1}} (λ_{1} \sqrt{C_{1}} + λ_{2} \sqrt{C_{2}})}, n_{2} = \frac{C λ_{2}}{\sqrt{C_{2}} (λ_{1} \sqrt{C_{1}} + λ_{2} \sqrt{C_{2}})} (18)$

Now we assume that there is censoring in the data since most of time we cannot follow all subjects until the events. Under some regular conditions we have

$Z_{β} = \frac{| λ_{1} - λ_{2} | - z_{α} \sqrt{(\frac{1}{n_{1}} + \frac{1}{n_{2}}) φ (\bar{λ})}}{\sqrt{\frac{φ (λ_{1})}{n_{1}} + \frac{φ (λ_{2})}{n_{2}}}} (19)$

where λ is given by (17) and

$φ (λ) = \frac{λ^{3} T}{λ T - 1 + e^{- λ T}}$

(see [9] for details). Obviously (16) is a special case of (19) with Φ(λ)=Λ₂. We need to use the iterative Lagrange multiplier method to get the optimal solution of (19) under (4). The asymptotic solution is given by

$n_{1} = \frac{C \sqrt{φ (λ_{1})}}{\sqrt{C_{1}} (\sqrt{φ (λ_{1}) C_{1}} + \sqrt{φ (λ_{2}) C_{2}})}, n_{2} = \frac{C \sqrt{φ (λ_{2})}}{\sqrt{C_{2}} (\sqrt{φ (λ_{1}) C_{1}} + \sqrt{φ (λ_{2}) C_{2}})} (20)$

In many applications it is important to detect possible difference in correlations. Suppose we have two independent samples with correlations r₁ and r₂, respectively. Our hypotheses are

H₀: r₁=r₂ vs H₁: r₁≠r₂ (21)

Then

$Z_{β} = \frac{| z_{(r_{1})} - z_{(r_{2})} |}{\sqrt{\frac{1}{n_{1} - 3} + \frac{1}{n_{2} - 3}}} - Z_{α}$ (22)

where

$Z_{(r)} = \frac{1}{2} \ln \frac{1 + r}{1 - r}$ (23)

according to Fisher’s arctanh transformation [10]. The hypotheses reach maximal power under constraint (4) when

$n_{1} = 3 + \frac{C - 3 C_{1} - 3 C_{2}}{\sqrt{C_{1}} (\sqrt{C_{1}} + \sqrt{C_{2}})}, n_{2} = 3 + \frac{C - 3 C_{1} - 3 C_{2}}{\sqrt{C_{2}} (\sqrt{C_{1}} + \sqrt{C_{2}})} (24)$

As an example we are going to prove (24) and show how to use Lagrange multiplier theory to prove similar results. In fact, to maximize the power we must maximize Z_Β in (22). Equivalently, we only need to minimize

$\frac{1}{n_{1} - 3} + \frac{1}{n_{2} - 3} (25)$

under constraint of (4). So the corresponding Lagrange multiplier function is

$Q (n_{1}, n_{2}, λ) = \frac{1}{n_{1} - 3} + \frac{1}{n_{2} - 3} + λ (n_{1} C_{1} + n_{2} C_{2} - C) (26)$

and

$\begin{matrix} \frac{\partial Q}{n_{1}} = - \frac{1}{{(n_{1} - 3)}^{2}} + λ C_{1} = 0 \\ \frac{\partial Q}{n_{2}} = - \frac{1}{{(n_{2} - 3)}^{2}} + λ C_{2} = 0 \end{matrix}$

imply

$n_{1} = \sqrt{\frac{1}{λ C_{1}}} + 3, n_{2} = \sqrt{\frac{1}{λ C_{2}}} + 3 (27)$

Plugging (27) in (4) and solving forλ, we get

$λ = \frac{{(\sqrt{C_{1}} + \sqrt{C_{2}})}^{2}}{{(C - 3 C_{1} - 3 C_{2})}^{2}} (28)$

Now we plug (28) in (27) and obtain (24). Since the Hessian matrix of Q with respect to n₁ and n₂ is

$[\begin{matrix} \begin{matrix} \frac{\partial^{2} Q}{\partial n_{1}^{2}} & \frac{\partial^{2} Q}{\partial n_{1} \partial n_{2}} \end{matrix} \\ \begin{matrix} \frac{\partial^{2} Q}{\partial n_{2} \partial n_{1}} & \frac{\partial^{2} Q}{n_{2}^{2}} \end{matrix} \end{matrix}] = [\begin{matrix} \begin{matrix} \frac{2}{{(n_{1} - 3)}^{3}} & 0 \end{matrix} \\ \begin{matrix} 0 & \frac{2}{{(n_{2} - 3)}^{3}} \end{matrix} \end{matrix}]$

which is positive definite, (24) is the minimum solution for (26). Therefore it maximizes the power.

Other results can be proved similarly.

Example: Suppose p₁=0.6, q₁=0.4, p₂=0.2, q₂=0.8, C1=$400, C2=$100, and C=$1000. If we want to test the difference at significant level 0.05 and 80% power with equal sample size, then n₁=n₂=23. This sample size allocation is going to give us the total cost of $1150, which is over the budget. If we use n₁=n2=20 our power will be 75%, which is usually not acceptable. Now plug all the parameters in our formula (7), then we get n₁=17.75 and n₂=28.99. Choosing n₁=18 and n₂=28 we will get power of 80% and the actual type I error is 0.043, which is what we want.

Conclusion and Discussion

Cost constraints have important impact in the design of experimental studies. We have studied optimal sample allocation to achieve maximal precision and power under total financial constraints for comparison of two samples. The results are easy to program and therefore have broad applications. But we must point out that there are limits, say, the problems have been simplified and we do not consider recruitment and related costs. For rare disease, minimal sample size for fixed power and false positive rate is more important than fixed cost due to difficulty in recruiting. The applicability of the results is quite obvious.

Acknowledgement

Thanks are given to Dr. James Anderson for reading the manuscripts and providing useful suggestions. We appreciate the suggestive comments of an anonymous referee.