Google Flu Trends, once a poster child for the power of big-​​data analysis, seems to be under attack.

This month, in a Sci­ence mag­a­zine article, four quan­ti­ta­tively adept social sci­en­tists reported that Google’s flu-​​tracking ser­vice not only wildly over­es­ti­mated the number of flu cases in the United States in the 2012–13 flu season — a well-​​known miss — but has also con­sis­tently over­shot in the last few years. Google Flu Trends’ esti­mate for the 2011-​​12 flu season was more than 50 per­cent higher than the cases reported by the Cen­ters for Dis­ease Con­trol and Pre­ven­tion. And, they wrote, for a period of more than two years ending in Sep­tember 2013, the Google esti­mates were high in 100 out of 108 weeks.

The article, “The Parable of Google Flu: Traps in Big Data Analysis,” declared that Google was guilty of “big data hubris,” which the authors defined as the implicit assump­tion that big data sets trump tra­di­tional data col­lec­tion and analysis. And they were skep­tical of Google Flu Trends’ algo­rithmic smarts. “The com­par­a­tive value of the algo­rithm as a stand-​​alone flu mon­itor is ques­tion­able,” they wrote.

A follow-​​up analysis by the four authors tracked Google Flu Trends’ per­for­mance in the just-​​concluded 2013–14 flu season, after Google updated its algo­rithm last October. There was some improve­ment, but the ser­vice still over­shot by about 30 per­cent, the authors wrote, in their paper, posted online.

Read the article at The New York Times →