Here we consider the joint estimation of a multivariate set of population means. That is, we have observed a set of p X-variables and may wish to estimate the population mean for each variable. In some instances, we may also want to estimate one or more linear combinations of population
means. Our basic tool for estimating the unknown value of a population parameter is a confidence interval, an interval of values that is likely to include the unknown value of the parameter. General Format for a Confidence Interval The general format of a confidence interval estimate of a population mean is: \(\text{Sample mean} \pm \text{Multiplier × Standard error of mean}\) For variable \(X_{j}\), a
confidence interval estimate of its population mean \(\mu_{j}\) is \(\bar{x}_j \pm \text{Multiplier}\dfrac{s_j}{\sqrt{n}}\) In this formula, \(\bar{x}_{j}\) is the sample mean, \(s_{j}\) is the sample standard deviation and n is the sample size. The multiplier value is a function of the confidence level, the sample size, and the strategy used for dealing with the multiple inference issue. The following list covers some common strategies: One at a Time Intervals SectionFor a \(1 - \alpha\) confidence interval, the “one at a time” multiplier is the t-value such that the probability is \(1 - \alpha\) between –t and +t under a t-distribution with n - 1 degrees of freedom. Said another way, the value of t is such that the probability greater than +t is \(\alpha/2\). Notationally, the one at a time multiplier is: \(\text{Multiplier} = t_{n-1}(\alpha/2)\) With this notation, a confidence interval for \(\mu_{j}\) is computed as: \(\bar{x}_j \pm t_{n-1}(\alpha/2)\frac{s_j}{\sqrt{n}}\) Note! The notation for the t-multiplier can be confusing because it varies between textbooks and statistical software. For instance, Excel’s command to determine the p-value requires that you give the value of α whereas SAS requires that you give the cumulative probability \(1 - \alpha / 2\) for the desired t-value. Example 5-1: One at a Time IntervalsSuppose that the sample size is n = 25 and we want a 95% confidence interval for the population mean. Thus \(\alpha = 0.05\). Our textbook would write the multiplier as \(t_{24}(.025)\). In Excel, the command =TINV(.05,24) will give the multiplier (value = 2.064). In SAS, a command such as t1=tinv(.975,24) will make the variable t1 that contains the desired multiplier. Bonferroni Method Multiplier SectionWhen we determine confidence intervals for the population means of several variables, we are creating a family of confidence intervals. The family-wide error rate is the probability that at least one of the confidence intervals in the family will not capture the population mean. The family-wide confidence level = 1 – family-wide error rate. Suppose that we have a family of p confidence intervals and the error rates for the individual intervals are \(\alpha _ { 1 } , \alpha _ { 2 } , \dots , \alpha _ { p }\). The Bonferroni Inequality states that the family wide-error rate is less than or equal to the sum of \(\alpha _ { 1 } , \alpha _ { 2 } , \dots , \alpha _ { p }\). That is family-wide error rate \(\leq \Sigma \alpha _ { i }\). In terms of the family-wide confidence that all intervals capture their population means, we can write this as \(1 - \Sigma \alpha _ { i } \leq\) family-wide confidence level. Most often, we divide the desired family-wide error rate equally across the intervals that we will compute. If we are computing p confidence intervals with a desired family wide confidence level of \(\alpha\), we use an error rate of \(\alpha / p\) (so confidence \(= 1 - (\alpha / p)\) for each individual interval. This guarantees that the family wide confidence level will be greater than or equal to \(1 - \alpha\). Suppose that we are calculating p intervals with a family error rate equal to \(\alpha\). Notationally, the Bonferroni method multiplier is: \(\text{Multiplier} = t_{n-1}(\alpha/2p)\) A confidence interval for\(\mu_{j}\) is computed as: \(\bar{x}_j \pm t_{n-1}(\alpha/2p)\frac{s_j}{\sqrt{n}}\) Example 5-2: Bonferroni Method MultiplierSuppose that n = 25. The family wide error = 5% for a family confidence = 95%. We are computing intervals for p = 5 means. The error rate for each interval will be .05/5 = 1%. We might use the Excel command = TINV(.01,24) to find that the multiplier = 2.797. In SAS, we use the cumulative probability \(= 1- \alpha /2p\) so the command for finding the t-multiplier in this instance is something like t1=tinv(.995, 24). Simultaneous Confidence Region Multiplier SectionThis method is derived from properties of the multivariate normal distribution. The multiplier applies to the family of all possible linear combinations of the population means considered, including the individual means. It is conservative (meaning that the multiplier tends to be larger than absolutely necessary). When family confidence is used, compare the value of this multiplier to the Bonferroni method multiplier and use the smaller of the two. Notationally, the simultaneous confidence region multiplier is: \(\text{Multiplier}=\sqrt{\frac{p(n-1)}{n-p}F_{p,n-p}(\alpha)}\) \(F _ { p , n - p } ( \alpha )\) represents a value of F such that the probability greater than this value is α under an F-distribution with p and n - p degrees of freedom. Example 5-3: Simultaneous Confidence Region MultiplierSuppose that we have a sample size of n = 25 and we have p = 3 variables. With a 5% family error rate (and 95% family confidence), the F-value can be found in Excel using = FINV(.05, 3, 22) = 3.049. SAS uses cumulative probabilities so in this case, a command like f1= FINV(.95,3, 22) would make f1 be the F-value. The multiplier in this example is \(\sqrt{\frac{3(25-1)}{25-3}3.049}=3.159\) This multiplier could be used for all confidence intervals for parameters that are linear combinations of the three population means (and for the three individual means). Summary of Multipliers SectionThe following table summarizes the three different multipliers and gives notes about using Excel and SAS.
Example 5-4 SectionThis example uses the dataset that includes mineral content measurements at three different arm bone locations for n = 25 women . We’ll determine confidence intervals for the three different population means. Sample means and standard deviations for the three variables are: Simple Statistics
Click to expand the solution using each method. We’ll use a .95 confidence level for each interval. With n = 25, df = 24 and \(t _ { 24 } ( .025 ) = 2.064\). This can found in Excel as =TINV(.05,24). The confidence intervals have the form \(\bar{x}_j \pm 2.064\dfrac{s_j}{\sqrt{n}}\). Intervals are the following.
We’ll use a .95 confidence family-wide level so the family error = .05. For each interval, the error rate = .05/3 = 0.16666… The multiplier is \(t _ { 24 } ( .008333 ) = 2.574\) which can be found in Excel as =TINV(.05/3,24). The confidence intervals have the form \(\bar{x}_j \pm 2.574\dfrac{s_j}{\sqrt{n}}\). Intervals are the following.
The necessary F value is \(\sqrt{\dfrac{3(25-1)}{25-3}3.049} = 3.159\). (See Example 3 above for details) The confidence intervals have the form \(\bar{x}_j \pm 3.159 \dfrac{s_j}{\sqrt{n}}\). Intervals are the following.
Using SASSteve Rathbun, formerly of Penn State, wrote the following SAS code (download below) to generate confidence intervals for population means using the three methods discussed here. The code reads a dataset, reshapes it to have a data line for each variable value, determines means and standard deviations and then calculates and prints the three types of intervals. To use this code for different situations, you need only to change the third line where the value of p is set and the data step where the data set is read and reshaped. Download the SAS program here: CI_pop_means.sas The output for the program just given is below. It includes the sample mean and variance for each variable and the three confidence intervals. Limits for the one at a time intervals are given as loone and upone. Limits for the Bonferroni method are given as lobon and upbon. Limits for the simultaneous confidence region method are given as losim and upsim.
Example: Nutrient Intake Data - Descriptive Statistics The MEANS Procedure
Using MinitabClick on the video below to get walk throughs of the three methods as they are presented below: the one-at-a-time confidence interval, the Bonferroni method and the multivariate simultaneous interval method, all the Minitab statistical software application.
Which distribution is used in developing an interval estimate for a population mean?The procedure just described for developing interval estimates of a population mean is based on the use of a large sample. In the small-sample case—i.e., where the sample size n is less than 30—the t distribution is used when specifying the margin of error and constructing a confidence interval estimate.
Which distribution is used in developing an interval estimation when the population standard deviation is known?The general form for a confidence interval for a single population mean, known standard deviation, normal distribution is given by X ¯ − Z α ( σ n ) ≤ μ ≤ X ¯ + Z α ( σ n ) X ¯ − Z α ( σ n ) ≤ μ ≤ X ¯ + Z α ( σ n ) This formula is used when the population standard deviation is known.
What is an interval estimate used to estimate?An interval estimate is a single value used to estimate a population parameter. An interval estimate is a range of values used to estimate a population parameter. Approximately 96 out of 100 such intervals would include the true value of the population parameter.
What is the probability that the interval estimate contains the population parameter?Intervals are commonly chosen such that the parameter falls within with a 95 or 99 percent probability, called the confidence coefficient.
|