Do error bars have to overlap the line of best fit?

You often print graphs in which data points have error bars that are too short. About a third of the error bars don’t overlap the line of best fit. So which is wrong?

Narrowing in on fp, the fraction of stars that have planets. Extrasolar planet hunters such as Geoffrey Marcy, R. Paul Butler, and colleagues detect the slight gravitational tug exerted by a massive planet on its parent star. A fairly close analog to a solar-system planet orbits 47 Ursae Majoris. It is at least 2.4 times as massive as Jupiter and follows a roughly circular orbit around the star every 3 years.

This measurement of the radial velocity of 47 Ursae Majoris, a planet-hosting star, shows measurements with error bars and a best-fit line.

Neither! When it comes to describing statistical uncertainty, such as drawing an error bar, the usual measure is plus or minus one “standard deviation.” In layfolks’ terms, this simply means the true
value has a 68 percent chance of being inside the error bar. (One standard deviation is often written “1 sigma” or “1 s.”)

If an error bar is labeled as representing two standard deviations (making it twice as tall on the same data point), it shows a 95 percent confidence level. Three standard deviations means a 99.7 percent
confidence level. Why these numbers? Never mind. Just remember 68, 95, and 99.7 percent confidence for 1, 2, and 3 sigma, and you’ll have the most important thing you missed by not taking statistics. (In political polling, unlike astronomy, 2-sigma values are usually cited. If a poll has a “4 percent statistical error,” it means the true value has a 95 percent chance of being within 4 percent of the value measured.)

All of this refers only to statistical or random error. You can reduce random error as close to zero as you want just by taking more data. The real bugaboo in science (and political polling) is systematic
error, such as if the sample of people polled is somehow biased compared to the people who will vote. (Do Republicans answer their phones more than Democrats?) Systematic error can’t be reduced just if you take a larger sample.

For example, the more times you flip a coin, the closer you’ll come to getting 50 percent heads. The likely difference from 50 percent is the random error, based only on how many times you flip. But if the coin is weighted so that it’s slightly biased to one side, you’ve got a systematic effect mixed in also. Fundamental to any science is reliably separating the two.

— Alan MacRobert