Computing Polling Error in a Simple Survey

An Interactive Applet powered by Sage and MathJax.

(By Prof. Gregory V. Bard. Updated by Ryan G. Hornberger.)

Overview

Have you ever noticed that political surveys often have a remark saying that "the margin of error of this survey is plus or minus 3%?" Have you ever wondered what that really means, and how it is calculated?

Let's imagine that you poll 100 people to determine if they are in favor or against some new local legislation, a candidate for election, or even something simple like asking if they are married. These are all questions to which the answer must be "yes" or "no." To keep things simple, let's say you poll people to ask them if they are in favor of Governor John Q. Public's re-election.

Now let's suppose 41 people say "yes." That means 41% said "yes," and 59% said "no." (For this simple example, we do not consider the possibility that someone said "not sure.") Is it really the case that 41% of that town is in favor of re-electing Governor John Q. Public? If the true percentage were 39%, 40%, 42%, or 43%, we should not be surprised. After all, we did not survey everyone in that town. This is called polling error.

In this specific case, we will use the interactive applet below to compute that with 95% confidence, the percent in the entire town who would have said "yes" is between 31.16% and 50.84%. Likewise, the percent in the entire town who would have said "no" is between 49.16% and 68.84%. I bet that's a lot larger than you might have first guessed.

Whenever you perform a survey to determine what percentage of some population is in favor or against some particular thing, or any question with a yes/no outcome, you should always compute the margin of error, and the 95% confidence interval, to be honest with your readership about polling error.

Many times these important calculations are not performed. I think that is because it is a multi-step computation, and that makes it both tedious and hard to remember. That's particularly the case for people who might only have need of doing this once every few years. Even I always look the formulas up before using them, just to make sure that I do not misremember one of them.

I have created the following tool to perform the calculation for you.

Instructions

First, click on the button marked "launch the interactive applet now."
Second, move the top slider so that it shows the correct number of people who answered your question "yes."
Third, move the bottom slider so that it shows the correct number of people who answered your question at all.
Fourth, copy down the 95% confidence intervals for "yes" and for "no," as well as any other statistics that might be useful for you.
Fifth, if you are curious about how these things are computed, then all you must do is switch "Show the Math?" from "false" to "true." Then you will see all the calculations.
Note that generally percentages are reported to the nearest 1% of 1%, or the nearest tenth of a percent. So if you see 0.311633338981136 in your answer, we typically should report this as either 31.16% or 31.2% in research publications. In items for the general public, you would typically see either 31.2% or 31%.

Is 95% Confidence the Right Choice?

By the way, the 95% confidence interval is a bedrock of social science research, but many physical scientists prefer the 99.75% confidence interval. As it turns out, this just means replacing a "2" with a "3" in one of the formulas, so it isn't a large change mathematically. In our example above, we obtain 26.25% to 55.75% for "yes" and 44.25% to 73.75% for "no." There is only a 1 in 400 chance that the truth is outside of these intervals. (Using the 95% confidence interval, there is a 1 in 20 chance that the truth is outside of those intervals which we discussed earlier.)

As far as our conclusions, however, this is very informative. If we didn't perform these calculations, and only heard that 41 out of 100 said that they were in favor of Governor John Q. Public's re-election, then we would think that John Q. Public will surely lose. However, if you look in the data of the previous paragraph, you would not walk away with that conclusion. We would see that it is quite possible that Governor John Q. Public commands a majority of the voters of this particular town.

Moreover, there is a danger that this data, if reported without confidence intervals, might result in some of Governor John Q. Public's opponents staying home and not voting on election day, thinking that he will surely lose. Of course, that increases his chances of winning.

A Discussion of Sources of Error

The above margins of error take care of polling error. That's the possibility that by sheer happenstance, your sample had an extra large or extra small number of people who might say yes or no to your question. However, there are other forms of error which are not taken into account here.

Non-Representative Samples

Details will be posted later.

Response Bias

Details will be posted later.

Order of the Questions

Details will be posted later.

Non-Binary Answers

Details will be posted later.

Phrasing of the Questions

Details will be posted later.

Last modified on June 8th, 2022.