*Statistical reasoning is skeptical.*

* -Paul F. Velleman*[1]

In early September 2016, the US Centers for Disease Control (CDC) released results of the National Health Interview Survey for the first quarter of 2016. Based on the CDC report, the Associated Press (AP) reported in the mainstream media for the American public that, as a result of provisions of the Affordable Care Act (ACA), the number of medically-uninsured Americans was at a “record low” in the first quarter of 2016 but that progress towards insuring all Americans had slowed “significantly”. As a statistician, AP’s use of the qualifying words “record” and “significantly” to describe the results bothered me. Specifically, I asked, “Were the qualifying terms that were used in the news report statistically accurate to describe the changes in proportion and the trend over time in the proportion of uninsured Americans?”

The AP report was directed at US citizens so I used information that was easily available and accessible to most Americans. I used descriptive statistics reported on a publicly-accessible CDC website. Because the ACA went into effect during the first quarter of 2014, I limited the data to percentages and standard errors from the first quarter of 2014 to the first quarter of 2016. I obtained these statistics from Table 3 of the CDC publication, “Early Release of Selected Estimates Based on Data from the 2014 National Health Interview Survey”. The AP report focused on changes in the number of uninsured Americans due to the ACA so I limited my analysis to Americans, ages 18 to 64, because this is the age group of people for which the ACA-mandated insurance marketplace was intended.

Most Americans aren’t statisticians with access to powerful statistical analysis software, so, for calculations, I deliberately did not use SAS, R, or other high-powered but relatively inaccessible statistical software. I used the widely-available software Microsoft Excel for calculations; although I did double check the results of some calculations by using the statistical software, Stata.

The first analysis was to answer whether a statistically-significantly decrease had occurred in the proportion of uninsured Americans, aged 18 to 64. Because CDC reported the standard errors for the percentage of uninsured in each quarter, I was able to calculate 95% confidence intervals describing the precision of the proportions. I reasoned that if the 95% confidence intervals for two quarters did not overlap, there was a statistically-significant difference in the number of uninsured Americans, ages 18 to 64, between the two quarters.

The 95% confidence interval for the proportion of uninsured Americans in the first quarter of 2016 overlapped with the 95% confidence interval for the proportion of uninsured Americans in the first quarter of 2015. The overlapping confidence intervals showed that there was not a statistically-significant difference in the proportions of uninsured Americans, aged 18 to 64 between these two quarters. In contrast, the 95% confidence interval for the proportion of uninsured Americans in the first quarter of 2014 did not overlap with either of the 95% confidence intervals for the proportions of uninsured Americans in the first quarters of 2015 or 2016. No overlapping confidence intervals indicated that the proportion of uninsured Americans decreased significantly when either the first quarter of 2015 or the first quarter of 2016 were compared to the first quarter of 2014, which was when the ACA took effect. So, the AP didn’t exactly misstate the statistics; there was a statistically-significant decrease in uninsured Americans between 2014 and either 2015 or 2016 but not between 2015 and 2016. However, AP didn’t specify a time frame for their comparison.

The second analysis was to answer whether progress towards insuring all Americans had slowed “significantly”. I went at this backwards by looking at the trend in reducing the number of uninsured Americans, aged 18 to 64. First, I used the CDC’s 2014 and 2015 proportion statistics to forecast the proportion of uninsured Americans, ages 18 to 64, during the first quarter of 2016. From this number, I used the standard error reported for the observed proportion of uninsured Americans, ages 18 to 64, in the first quarter of 2016 to calculate a 95% confidence interval describing the precision of the forecasted 2016 quarter 1 proportion.

Although the forecasted proportion of uninsured Americans, ages 18 to 64, in the first quarter of 2016 was lower than the actual observed proportion of uninsured Americans in that quarter, the 95% confidence intervals overlapped for the forecasted and observed proportions of uninsured Americans in the first quarter of 2016. This showed that there was not a statistically-significant difference between the actual, observed proportion of uninsured Americans, ages 18 to 64, in the first quarter of 2016 and the forecasted proportion of uninsured Americans in the first quarter of 2016. The observed proportion of uninsured Americans, ages 18 to 64, was exactly what would have been expected if the downward trend in the proportion of uninsured Americans observed during 2014 and 2015 had continued without changing.

Next, I conducted two linear regression analyses to obtain the slopes of two trend lines in the decrease in the proportion of uninsured Americans, ages 18 to 64 over the 9 quarters from 2014 to 2016. The two data used in both linear regression analyses were nearly identical; however, one analysis used the forecasted proportion for the first quarter of 2016 and the second analysis used the real, observed proportion from the first quarter of 2016. The linear regression analyses produced a trend line, which described the overall nature of the decrease in uninsured Americans, and the slope of the trend line, which described the rate of decrease in uninsured Americans. 95% confidence intervals were produced that described the precision of the trend line and slope calculations.

The 95% confidence intervals describing the precision of the observed and forecasted trend lines overlapped. Hence, there were no differences between the actual and forecasted trend lines. In addition, the 95% confidence intervals describing the precision of the estimated slopes for the observed and forecasted trend lines also overlapped. There were no differences between the slopes of the actual and forecasted trend lines. Because the two trend lines were identical, both in overall character and rate of decrease, the progress in decreasing the number of uninsured Americans, ages 18 to 64, had not slowed. Consequently, the AP report did not accurately reflect the “nation’s progress in getting more people covered by health insurance”. The decrease in uninsured Americans, ages 18 to 64, had continued at the exactly same rate as before.

My healthy skepticism regarding the AP use of the qualifying words “record” and “significantly” to describe the results was well-placed. If the AP was referring to a comparison between the first quarter of 2014 and 2016, the statement that a record decrease had occurred was true. However, in stating that progress towards insuring all Americans had slowed “significantly”, the AP may have fallen into the trap of Mark Twain’s third type of lie: using statistics inappropriately.[ii]