### case \ ( \PageIndex { 4 } \ )

One frequently reads that a poll has been taken to estimate the proportion of people in a certain population who favor one campaigner over another in a slipstream with two candidates. ( This model besides applies to races with more than two candidates \ ( A\ ) and \ ( B\ ), and two vote propositions. ) distinctly, it is not possible for pollsters to ask everyone for their preference. What is done alternatively is to pick a subset of the population, called a sample, and ask everyone in the sample for their preference. Let \ ( p\ ) be the actual proportion of people in the population who are in favor of campaigner \ ( A\ ) and let \ ( q = 1-p\ ). If we choose a sample distribution of size \ ( n\ ) from the population, the preferences of the people in the sample distribution can be represented by random variables \ ( X_1, \ X_2, \ \ldots, \ X_n\ ), where \ ( X_i = 1\ ) if person \ ( i\ ) is in favor of campaigner \ ( A\ ), and \ ( X_i = 0\ ) if person \ ( i\ ) is in privilege of campaigner \ ( B\ ). Let \ ( S_n = X_1 + X_2 + \cdots + X_n\ ). If each subset of size \ ( n\ ) is chosen with the like probability, then \ ( S_n\ ) is hypergeometrically distributed. If \ ( n\ ) is small proportional to the size of the population ( which is typically true in practice ), then \ ( S_n\ ) is approximately binomially distributed, with parameters \ ( n\ ) and \ ( p\ ) .

The pollster wants to estimate the respect \ ( p\ ). An estimate for \ ( p\ ) is provided by the respect \ ( \bar p = S_n/n\ ), which is the proportion of people in the sample who favor candidate \ ( B\ ). The Central Limit Theorem says that the random variable star \ ( \bar p\ ) is approximately normally distributed. ( In fact, our adaptation of the Central Limit Theorem says that the distribution affair of the random variable \ [ S_n^ * = \frac { S_n – neptunium } { \sqrt { npq } } \ ] is approximated by the standard normal density. ) But we have \ [ \bar p = \frac { S_n – nurse practitioner } { \sqrt { npq } } \sqrt { \frac { pq } { n } } +p\, \ ] i, \ ( \bar p\ ) is just a linear officiate of \ ( S_n^ * \ ). Since the distribution of \ ( S_n^ * \ ) is approximated by the criterion normal concentration, the distribution of the random variable \ ( \bar p\ ) must besides be bell-shaped. We besides know how to write the mean and standard deviation of \ ( \bar p\ ) in terms of \ ( p\ ) and \ ( n\ ). The beggarly of \ ( \bar p\ ) is just \ ( p\ ), and the criterion diversion is \ [ \sqrt { \frac { pq } { newton } } \ .\ ] Thus, it is easy to write down the standardized interpretation of \ ( \bar p\ ) ; it is \ [ \bar p^ * = \frac { \bar p – p } { \sqrt { pq/n } } \ .\ ]

Since the distribution of the standardized translation of \ ( \bar p\ ) is approximated by the standard normal concentration, we know, for model, that 95 % of its values will lie within two standard deviations of its beggarly, and the lapp is true of \ ( \bar p\ ). So we have \ [ P\left ( p – 2\sqrt { \frac { pq } { nitrogen } } < \bar p < phosphorus + 2\sqrt { \frac { pq } { nitrogen } } \right ) \approx .954\ .\ ] nowadays the pollster does not know \ ( p\ ) or \ ( q\ ), but he can use \ ( \bar p\ ) and \ ( \bar q = 1 - \bar p\ ) in their plaza without excessively a lot danger. With this idea in mind, the above instruction is equivalent to the statement \ [ P\left ( \bar p - 2\sqrt { \frac { \bar p \bar q } { normality } } < p < \bar p + 2\sqrt { \frac { \bar phosphorus \bar q } { nitrogen } } \right ) \approx .954\ .\ ] The resulting interval \ [ \left ( \bar p - \frac { 2\sqrt { \bar p \bar q } } { \sqrt n }, \ \bar phosphorus + \frac { 2\sqrt { \bar p \bar q } } { \sqrt n } \right ) \ ] is called the for the unknown value of \ ( p\ ). The list is suggested by the fact that if we use this method acting to estimate \ ( p\ ) in a bombastic number of samples we should expect that in about 95 percentage of the samples the on-key respect of \ ( p\ ) is contained in the confidence interval obtained from the sample distribution. In Exercise \ ( \PageIndex { 11 } \ ) you are asked to write a program to illustrate that this does indeed happen . The pollster has control over the value of \ ( n\ ). frankincense, if he wants to create a 95 % confidence interval with distance 6 %, then he should choose a value of \ ( n\ ) so that \ [ \frac { 2\sqrt { \bar p \bar q } } { \sqrt nitrogen } \le .03\ .\ ] Using the fact that \ ( \bar p \bar q \le 1/4\ ), no matter what the value of \ ( \bar p\ ) is, it is easy to show that if he chooses a value of \ ( n\ ) so that \ [ \frac { 1 } { \sqrt north } \le .03\, \ ] he will be safe. This is equivalent to choosing \ [ normality \ge 1111\ .\ ] So if the pollster chooses \ ( n\ ) to be 1200, say, and calculates \ ( \bar p\ ) using his sample of size 1200, then 19 times out of 20 ( i, 95 % of the time ), his confidence interval, which is of length 6 %, will contain the true rate of \ ( p\ ). This type of confidence time interval is typically reported in the newsworthiness as follows : this survey has a 3 % margin of error. In fact, most of the surveys that one sees reported in the newspaper will have sample sizes around 1000. A reasonably surprise fact is that the size of the population has obviously no effect on the sample size needed to obtain a 95 % confidence interval for \ ( p\ ) with a given gross profit of erroneousness. To see this, eminence that the measure of \ ( n\ ) that was needed depended only on the act .03, which is the margin of error. In other words, whether the population is of size 100,000 or 100,000,000, the pollster needs only to choose a sample of size 1200 or then to get the same accuracy of estimate of \ ( p\ ). ( We did use the fact that the sample size was small relative to the population size in the argument that \ ( S_n\ ) is approximately binomially distributed. )

In Figure [ fig 9.2.1 ], we show the results of simulating the poll process. The population is of size 100,000, and for the population, \ ( phosphorus = .54\ ). The sample size was chosen to be 1200. The spike graph shows the distribution of \ ( \bar p\ ) for 10,000 randomly chosen samples. For this model, the course of study kept path of the number of samples for which \ ( \bar p\ ) was within 3 % of .54. This number was 9648, which is close to 95 % of the number of samples used.

Another direction to see what the idea of confidence intervals means is shown in Figure [ fig 9.2.2 ]. In this figure, we show 100 confidence intervals, obtained by computing \ ( \bar p\ ) for 100 unlike samples of size 1200 from the lapp population as ahead. The lector can see that most of these confidence intervals ( 96, to be demand ) contain the true rate of \ ( p\ ) .

The Gallup Poll has used these polling techniques in every presidential election since 1936 ( and in countless other elections as well ). table [ table 9.1 ] 1 shows the results of their efforts. The lector will note that most of the approximations to \ ( p\ ) are within 3 % of the actual value of \ ( p\ ). The sample distribution sizes for these polls were typically around 1500. ( In the table, both the predicted and actual percentages for the winning campaigner refer to the percentage of the vote among the “ major ” political parties. In most elections, there were two major parties, but in several elections, there were three. )

Year | \(\,\) Winning | Gallup Final | Election | Deviation |

Candidate | Survey | Result | ||

1936 | Roosevelt | 55.7% | 62.5% | 6.8% |

1940 | Roosevelt | 52.0% | 55.0% | 3.0% |

1944 | Roosevelt | 51.5% | 53.3% | 1.8% |

1948 | Truman | 44.5% | 49.9% | 5.4% |

1952 | Eisenhower | 51.0% | 55.4% | 4.4% |

1956 | Eisenhower | 59.5% | 57.8% | 1.7% |

1960 | Kennedy | 51.0% | 50.1% | 0.9% |

1964 | Johnson | 64.0% | 61.3% | 2.7% |

1968 | Nixon | 43.0% | 43.5% | 0.5% |

1972 | Nixon | 62.0% | 61.8% | 0.2% |

1976 | Carter | 48.0% | 50.0% | 2.0% |

1980 | Reagan | 47.0% | 50.8% | 3.8% |

1984 | Reagan | 59.0% | 59.1% | 0.1% |

1988 | Bush | 56.0% | 53.9% | 2.1% |

1992 | Clinton | 49.0% | 43.2% | 5.8% |

1996 | Clinton | 52.0% | 50.1% | 1.9% |

This technique besides plays an authoritative character in the evaluation of the potency of drugs in the medical profession. For case, it is sometimes desired to know what proportion of patients will be helped by a new drug. This proportion can be estimated by giving the drug to a subset of the patients, and determining the proportion of this sample who are helped by the drug .

## Leave a Comment