Research Article

Austin Biom and Biostat. 2015;2(3): 1024.

# Optimal Sample Size Allocation under Financial Constraint

Jiangao Luo¹*, Yanpin Wang² and Jane Meza¹

¹Department of Biostatistics, University of Nebraska Medical Center, USA

²First National Bank, USA

***Corresponding author: ** Jiangao Luo, Department
of Biostatistics, College of Public Health, University of
Nebraska Medical Center, 984375 Nebraska Medicine,
Emile and 42^{nd} St, Omaha, NE 68198-4375, USA

**Received:**March 09, 2015;**Accepted:**June 18, 2015;**Published: **July 07, 2015

## Abstract

Optimal sample sizes for comparison of two groups with financial constraints are discussed in this paper. We study two types of optimal sample sizes under the financial constraints: (a) minimize the variance of the difference and ratio of two independent binary data under financial constraint, (b) maximize power for detecting the difference of two proportions, two survival rates and two correlations with financial constraint.

**Keywords:** Sample size; Power; Lagrange method; Financial constraint

## Introduction

In the designing of medical studies we often face to decide the optimal sample sizes for interventions and controls. It has been decades since Cochran [1] studied the optimal sample size allocation under different sampling schemes. Allison et al. [2] have considered power, sample size and financial efficiency simultaneously. Guo et al. [3] have studied the sample size allocation ratio by minimizing the cost and maximizing the power. Guo and Luh [4] have also studied sample size allocation of comparing two trimmed means under given total cost. This is very important since nowadays investigators are facing the funding cut for their studies. Therefore it is crucial to get the optimal clinical trial results under financial cut and constraints. The main focus of this paper is to discuss how to get optimal precision for difference and ratio of two binary data and power for detecting the difference of two proportions, two survival rates and two correlations with financial constraints.

## Minimal variance under financial constraints

Assume we have a clinical trial in which the sample sizes for
intervention and control are n_{1} and n_{2}, respectively. We use p_{1} and
p_{2} to denote the proportions for binary responses for two groups,
respectively. For continuous data, we use µ1 and µ2 for the means. Let
p = p_{1} - p_{2} and R = p_{1}/p_{2} and µ = µ_{1} – µ_{2}.

We assume that $n{\widehat{p}}_{1}\sim Bin\left({n}_{1},{p}_{1}\right),\text{}n{\widehat{p}}_{2}\sim Bin\left({n}_{2},{p}_{2}\right)$ and ${\widehat{\mu}}_{1}\sim N\left({\mu}_{1},\frac{{\sigma}_{1}^{2}}{{n}_{1}}\right),\text{}{\widehat{\mu}}_{1}\sim N\left({\mu}_{1},\frac{{\sigma}_{1}^{2}}{{n}_{1}}\right),$ respectively. Under the independent assumption

$Var\left(\widehat{p}\right)=\frac{{p}_{1}{q}_{1}}{{n}_{1}}+\frac{{p}_{2}{q}_{2}}{{n}_{2}}\text{(1)}$

and

$Var\left(\widehat{\mu}\right)=\frac{{\sigma}_{1}^{2}}{{n}_{1}}+\frac{{\sigma}_{2}^{2}}{{n}_{2}}\text{(2)}$

Using delta method we can get

$Var\left(\widehat{R}\right)={\left(\frac{{p}_{1}}{{p}_{2}}\right)}^{2}\left(\frac{{q}_{1}}{{n}_{1}{p}_{1}}+\frac{{q}_{2}}{{n}_{2}{p}_{2}}\right)\text{(3)}$

Brittain and Schlesselman [5] have discussed the minimal solutions of (1) and (3) by finding the ratio of $\frac{{n}_{1}}{{n}_{1}+{n}_{2}}$ But in real situation we often face the constraint of budget. Say the costs for each subject in intervention (group 1) and control (group 2) are C1 and C2, respectively, and the total cost is C. Therefore, we have the following constraint

n_{1}C_{1} + n_{2}C_{2} = C (4)

The optimal solution of (2) under (4) was given by Cochran [1] with

${n}_{1}=\frac{C{\sigma}_{1}}{\sqrt{{C}_{1}}\left({\sigma}_{1}\sqrt{{C}_{1}}+{\sigma}_{2}\sqrt{{C}_{2}}\right)},\text{}{n}_{2}=\frac{C{\sigma}_{2}}{\sqrt{{C}_{2}}\left({\sigma}_{2}\sqrt{{C}_{1}}+{\sigma}_{2}\sqrt{{C}_{2}}\right)}\text{(5)}$

or

${n}_{1}:{n}_{2}={\sigma}_{1}\sqrt{{C}_{2}}:{\sigma}_{2}\sqrt{{C}_{1}}\text{(6)}$

Cochran obtained this result under the setting of optimum allocation
of double sampling, C_{1} and C_{2} were unit sampling costs, respectively,
and the structures of s_{1} and s_{2} were more complicated than here. Guo
et al. [3] have proved that (6) also attains optimal power for fixed
total cost.

The optimal solution of (1) under constraint (4) is

${n}_{1}=\frac{C\sqrt{{p}_{1}{q}_{1}}}{\sqrt{{C}_{1}\sqrt{{p}_{1}{q}_{1}{C}_{1}}:\sqrt{{p}_{2}{q}_{2}{C}_{2}}}},\text{}{n}_{2}\frac{C\sqrt{{p}_{2}{q}_{2}}}{\sqrt{{C}_{2}\sqrt{{p}_{1}{q}_{1}{C}_{1}}:\sqrt{{p}_{2}{q}_{2}{C}_{2}}}}\text{(7)}$

according to Lagrange multiplier theory [6]. Therefore

${n}_{1}:{n}_{2}=\frac{{p}_{1}{q}_{1}}{\sqrt{{C}_{1}}}:\frac{\sqrt{{p}_{2}{q}_{2}}}{\sqrt{{C}_{2}}}=\sqrt{{p}_{1}{q}_{1}{C}_{2}}:\sqrt{{p}_{2}{q}_{2}{C}_{1}}\text{(8)}$

but

${n}_{1}:{n}_{2}=\sqrt{{p}_{1}{q}_{1}}:\sqrt{{p}_{2}{q}_{2}}$

if there is no financial constraint (4) according to [5].

Similarly, (3) is minimized under constraint (4) when

${n}_{1}=\frac{C}{{C}_{1}}\frac{\sqrt{\frac{{q}_{1}{C}_{1}}{{p}_{1}}}}{\sqrt{\frac{{q}_{1}{C}_{1}}{{p}_{1}}}+\sqrt{\frac{{q}_{2}{C}_{2}}{{p}_{2}}}},\text{}{n}_{2}\frac{C}{{C}_{2}}\frac{\sqrt{\frac{{q}_{2}{C}_{2}}{{p}_{2}}}}{\sqrt{\frac{{q}_{1}{C}_{1}}{{p}_{1}}}+\sqrt{\frac{{q}_{2}{C}_{2}}{{p}_{2}}}}\text{(9)}$

which imply

${n}_{1}:{n}_{2}=\sqrt{\frac{{q}_{1}}{{p}_{1}{C}_{1}}}:\sqrt{\frac{{q}_{2}}{{p}_{2}{C}_{2}}}=\sqrt{\frac{{q}_{1}{C}_{2}}{{p}_{1}}}:\sqrt{\frac{{q}_{2}{C}_{1}}{{p}_{2}}}\text{(10)}$

C_{1} and C_{2} are extra terms compared to the corresponding result in [5].

Example: Now consider the design for an experiment of binary
data with p_{1}=01, p_{2}=0.05, C_{1}=40, C_{2}=10 and C=21750. To minimize
the variance of ?p we choose n_{1}=399 and n_{1}=579 according to our
formula (7). This sample size allocation will give us 84% power at
significant level of 0.05 and precision Var$Var\left(\widehat{p}\right)=\mathrm{0.000308.}$ If we choose
n_{1}=n_{2}=435 then we can only get 80% power at level 0.05 and precision
Var$Var\left(\widehat{p}\right)=\mathrm{0.000316.}$
(

## Maximal power with financial constraints

For two independent samples of continuous data with hypotheses

H0: µ_{1}= µ_{2} vs H_{1}: µ_{1}≠ µ_{2} (11)

the critical point for the power can be written as

${Z}_{\beta}=\frac{\left|\mu \right|-{z}_{\alpha}\sqrt{\frac{{\sigma}_{1}^{2}}{{n}_{1}}+\frac{{\sigma}_{2}^{2}}{{n}_{2}}}}{\frac{{\sigma}_{1}^{2}}{{n}_{1}}+\frac{{\sigma}_{2}^{2}}{{n}_{2}}}=\frac{\left|\mu \right|}{\sqrt{\frac{{\sigma}_{1}^{2}}{{n}_{1}}+\frac{{\sigma}_{2}^{2}}{{n}_{2}}}}-{Z}_{\alpha}\text{(12)}$

which is maximized when

$\frac{{\sigma}_{1}^{2}}{{n}_{1}}+\frac{{\sigma}_{2}^{2}}{{n}_{2}}$

is minimized. Therefore the test for hypotheses (11) reaches maximal power under financial constraint (4) when the sample sizes are allocated according to (5) (Guo et al. [3]). Namely the solution (5) simultaneously minimizes the precision and maximizes the power.

The critical point for the power of hypotheses:

H_{0}: p_{1}=p_{2} vs H_{1}: p_{1}≠p_{2} (13)

for dichotomous data is given by Fleiss ([7])

${Z}_{\beta}=\frac{\left|p\right|-{z}_{\alpha}\sqrt{\left(\frac{1}{{n}_{1}}+\frac{1}{{n}_{2}}\right)\overline{p}\overline{q}}}{\sqrt{\frac{{p}_{1}{q}_{1}}{{n}_{1}}+\frac{{p}_{2}{q}_{2}}{{n}_{2}}}}\text{(14)}$

with

$\overline{p}=\frac{{n}_{1}{p}_{1}+{n}_{2}{p}_{2}}{{n}_{1}+{n}_{2}},\text{}\overline{q}=1-\overline{p},\text{}p={p}_{1}-{p}_{2}$

and Z_{α} and Z_{β} are the cut off points for type I and II errors, respectively,
in normal distribution. The optimal solution of (14) under the
constraint (4) has no closed form and we can only use iterative
algorithm to get it. The details can be found in [6]. But as sample size
n_{1}+n_{2}→∞ the solution of (14) with the constraint (4) is the same as (7).

Let us consider the survival analysis with two independent samples. First assume we are going to follow the subjects until the events. Then there is no censoring. For testing the hypotheses

H_{0}: λ_{1}=λ_{2} vs H_{1}: λ_{1}≠λ_{2} (15)

Pasternack and Gilbert have given the following formula ([8])

${Z}_{\beta}=\frac{\left|{\lambda}_{1}-{\lambda}_{2}\right|-{z}_{\alpha}\sqrt{\left(\frac{1}{{n}_{1}}+\frac{1}{{n}_{2}}\right){\overline{\lambda}}^{2}}}{\sqrt{\frac{{\lambda}_{1}^{2}}{{n}_{1}}+\frac{{\lambda}_{2}^{2}}{{n}_{2}}}}\text{(16)}$

where

$\overline{\lambda}=\frac{{n}_{1}{\lambda}_{1}+{n}_{2}{\lambda}_{2}}{{n}_{1}+{n}_{2}}\text{(17)}$

Again the optimal solution for (16) under constraint (4) has no closed
form and asymptotic solution as n_{1}+n_{2}→∞ is given by

${n}_{1}=\frac{C{\lambda}_{1}}{\sqrt{{C}_{1}}\left({\lambda}_{1}\sqrt{{C}_{1}}+{\lambda}_{2}\sqrt{{C}_{2}}\right)},\text{}{n}_{2}=\frac{C{\lambda}_{2}}{\sqrt{{C}_{2}}\left({\lambda}_{1}\sqrt{{C}_{1}}+{\lambda}_{2}\sqrt{{C}_{2}}\right)}\text{(18)}$

Now we assume that there is censoring in the data since most of time we cannot follow all subjects until the events. Under some regular conditions we have

${Z}_{\beta}=\frac{\left|{\lambda}_{1}-{\lambda}_{2}\right|-{z}_{\alpha}\sqrt{\left(\frac{1}{{n}_{1}}+\frac{1}{{n}_{2}}\right)\phi \left(\overline{\lambda}\right)}}{\sqrt{\frac{\phi \left({\lambda}_{1}\right)}{{n}_{1}}+\frac{\phi \left({\lambda}_{2}\right)}{{n}_{2}}}}\text{(19)}$

where λ is given by (17) and

$\phi \left(\lambda \right)=\frac{{\lambda}^{3}T}{\lambda T-1+{e}^{-\lambda T}}$

(see [9] for details). Obviously (16) is a special case of (19) with
Φ(λ)=Λ_{2}. We need to use the iterative Lagrange multiplier method to
get the optimal solution of (19) under (4). The asymptotic solution is
given by

${n}_{1}=\frac{C\sqrt{\phi \left({\lambda}_{1}\right)}}{\sqrt{{C}_{1}}\left(\sqrt{\phi \left({\lambda}_{1}\right){C}_{1}}+\sqrt{\phi \left({\lambda}_{2}\right){C}_{2}}\right)},\text{}{n}_{2}=\frac{C\sqrt{\phi \left({\lambda}_{2}\right)}}{\sqrt{{C}_{2}}\left(\sqrt{\phi \left({\lambda}_{1}\right){C}_{1}}+\sqrt{\phi \left({\lambda}_{2}\right){C}_{2}}\right)}\text{(20)}$

In many applications it is important to detect possible difference
in correlations. Suppose we have two independent samples with
correlations r_{1} and r_{2}, respectively. Our hypotheses are

H_{0}: r_{1}=r_{2} vs H_{1}: r_{1}≠r_{2} (21)

Then

${Z}_{\beta}=\frac{\left|{z}_{\left({r}_{1}\right)}-{z}_{\left({r}_{2}\right)}\right|}{\sqrt{\frac{1}{{n}_{1}-3}+\frac{1}{{n}_{2}-3}}}-{Z}_{\alpha}$ (22)

where

${Z}_{\left(r\right)}=\frac{1}{2}\mathrm{ln}\frac{1+r}{1-r}$ (23)

according to Fisher’s arctanh transformation [10]. The hypotheses reach maximal power under constraint (4) when

${n}_{1}=3+\frac{C-3{C}_{1}-3{C}_{2}}{\sqrt{{C}_{1}}\left(\sqrt{{C}_{1}}+\sqrt{{C}_{2}}\right)},\text{}{n}_{2}=3+\frac{C-3{C}_{1}-3{C}_{2}}{\sqrt{{C}_{2}}\left(\sqrt{{C}_{1}}+\sqrt{{C}_{2}}\right)}\text{(24)}$

As an example we are going to prove (24) and show how to
use Lagrange multiplier theory to prove similar results. In fact, to
maximize the power we must maximize Z_{Β} in (22). Equivalently, we
only need to minimize

$\frac{1}{{n}_{1}-3}+\frac{1}{{n}_{2}-3}\text{(25)}$

under constraint of (4). So the corresponding Lagrange multiplier function is

$Q\left({n}_{1},{n}_{2},\lambda \right)=\frac{1}{{n}_{1}-3}+\frac{1}{{n}_{2}-3}+\lambda \left({n}_{1}{C}_{1}+{n}_{2}{C}_{2}-C\right)\text{(26)}$

and

$\begin{array}{c}\frac{\partial Q}{{n}_{1}}=-\frac{1}{{\left({n}_{1}-3\right)}^{2}}+\lambda {C}_{1}=0\\ \frac{\partial Q}{{n}_{2}}=-\frac{1}{{\left({n}_{2}-3\right)}^{2}}+\lambda {C}_{2}=0\end{array}$

imply

${n}_{1}=\sqrt{\frac{1}{\lambda {C}_{1}}}+3,\text{}{n}_{2}=\sqrt{\frac{1}{\lambda {C}_{2}}}+3\text{(27)}$

Plugging (27) in (4) and solving forλ, we get

$\lambda =\frac{{\left(\sqrt{{C}_{1}}+\sqrt{{C}_{2}}\right)}^{2}}{{\left(C-3{C}_{1}-3{C}_{2}\right)}^{2}}\text{(28)}$

Now we plug (28) in (27) and obtain (24). Since the Hessian matrix of
Q with respect to n_{1} and n_{2} is

$\left[\begin{array}{c}\begin{array}{cc}\frac{{\partial}^{2}Q}{\partial {n}_{1}^{2}}& \frac{{\partial}^{2}Q}{\partial {n}_{1}\partial {n}_{2}}\end{array}\\ \begin{array}{cc}\frac{{\partial}^{2}Q}{\partial {n}_{2}\partial {n}_{1}}& \frac{{\partial}^{2}Q}{{n}_{2}^{2}}\end{array}\end{array}\right]=\left[\begin{array}{c}\begin{array}{cc}\frac{2}{{\left({n}_{1}-3\right)}^{3}}& 0\end{array}\\ \begin{array}{cc}0& \frac{2}{{\left({n}_{2}-3\right)}^{3}}\end{array}\end{array}\right]$

which is positive definite, (24) is the minimum solution for (26). Therefore it maximizes the power.

Other results can be proved similarly.

**Example:** Suppose p_{1}=0.6, q_{1}=0.4, p_{2}=0.2, q_{2}=0.8, C1=$400,
C2=$100, and C=$1000. If we want to test the difference at significant
level 0.05 and 80% power with equal sample size, then n_{1}=n_{2}=23. This
sample size allocation is going to give us the total cost of $1150, which
is over the budget. If we use n_{1}=n2=20 our power will be 75%, which
is usually not acceptable. Now plug all the parameters in our formula
(7), then we get n_{1}=17.75 and n_{2}=28.99. Choosing n_{1}=18 and n_{2}=28
we will get power of 80% and the actual type I error is 0.043, which is
what we want.

## Conclusion and Discussion

Cost constraints have important impact in the design of experimental studies. We have studied optimal sample allocation to achieve maximal precision and power under total financial constraints for comparison of two samples. The results are easy to program and therefore have broad applications. But we must point out that there are limits, say, the problems have been simplified and we do not consider recruitment and related costs. For rare disease, minimal sample size for fixed power and false positive rate is more important than fixed cost due to difficulty in recruiting. The applicability of the results is quite obvious.

## Acknowledgement

Thanks are given to Dr. James Anderson for reading the manuscripts and providing useful suggestions. We appreciate the suggestive comments of an anonymous referee.

## References

- Cochran W. Sampling Techniques, 3rd edn. Wiley, New York. 1977; 448.
- Allison DB, Allison RL, Faith MS, Paultre F, Pi-Sunyer FX. Power and Money: Designing Statistically Powerful Studies While Minimizing Financial Costs, Psychological Methods. 1997; 2: 20-33.
- Guo JH, Chen HJ, Luh WM. Sample size planning with the cost constraint for testing superiority and equivalence of two independent groups. Br J Math Stat Psychol. 2011; 64: 439-461.
- Guo JH, Luh WM. Optimum sample size allocation to minimize cost or maximize power for the two-sample trimmed mean test. Br J Math Stat Psychol. 2009; 62: 283-298.
- Brittain E, Schlesselman JJ. Optimal allocation for the comparison of proportions. Biometrics. 1982; 38: 1003-1009.
- Bertsekas DP. Nonlinear Programming, Athena Scientific, Belmont, MA. 1999.
- Fleiss J. Statistical Methods for Rates and Proportions, Wiley, New York. 1973.
- Pasternack BS, Gilbert HS. Planning the duration of long-term survival time studies designed for accrual by cohorts. J Chronic Dis. 1971; 24: 681-700.
- Gross A, Clark V. Survival Distributions: Reliability Applications in Biomedical Science, Wiley, New York. 1975.
- Freiman JA, Chalmers TC, Smith H Jr, Kuebler RR. The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial. Survey of 71 “negative” trials. N Engl J Med. 1978; 299: 690-694.