Google may be a master at data wran­gling, but one of its prod­ucts has been making bogus data-​​driven pre­dic­tions. A study of Google’s much-​​hyped flu tracker has con­sis­tently over­es­ti­mated flu cases in the US for years. It’s a failure that high­lights the danger of relying on big data technologies.

Google Flu Trends, which launched in 2008, mon­i­tors web searches across the US to find terms asso­ci­ated with flu activity such as “cough” or “fever”. It uses those searches to pre­dict up to nine weeks in advance the number of flu-​​related doc­tors’ visits that are likely to be made. The system has con­sis­tently over­es­ti­mated flu-​​related visits over the past three years, and was espe­cially inac­cu­rate around the peak of flu season – when such data is most useful. In the 2012/​2013 season, it pre­dicted twice as many doc­tors’ visits as the US Cen­ters for Dis­ease Con­trol and Pre­ven­tion (CDC) even­tu­ally recorded. In 2011/​2012 it over­es­ti­mated by more than 50 per cent.

The study’s lead author, David Lazer, of North­eastern Uni­ver­sity, says the fixes for Google’s prob­lems are rel­a­tively simple – much like recal­i­brating weighing scales. “It’s a bit of a puzzle, because it really wouldn’t have taken that much work to sub­stan­tially improve the per­for­mance of Google Flu Trends,” he says. Merely pro­jecting cur­rent CDC data three weeks into the future yields more accu­rate results than those com­piled by Google Flu Trends. Com­bining the two resulted in the most accu­rate model of all. Lazer says Google Flu Trends does have promise, espe­cially at pre­dicting flu trends over smaller areas than the CDC takes into account, which could enable indi­vidual cities or states to prepare.

Read the article at New Scientist →