Tech companies—Facebook, Google and IBM, to name a few—are quick to tout the world-​​changing powers of “big data” gleaned from mobile devices, Web searches, cit­izen sci­ence projects and sensor net­works. Never before has so much data been avail­able cov­ering so many areas of interest, whether it’s online shop­ping trends or cancer research. Still, some sci­en­tists cau­tion that par­tic­u­larly when it comes to data, bigger isn’t nec­es­sarily better.

Con­text is often lacking when info is pulled from dis­parate sources, leading to ques­tion­able con­clu­sions. Case in point are the dif­fi­cul­ties that Google Flu Trends (GFT) has expe­ri­enced at times in accu­rately mea­suring influenza levels since Google launched the ser­vice in 2008. A team of researchers explains where this big-​​data tool is lacking—and where it has much greater potential—in a Policy Forum pub­lished Friday in the journal Science.

Google designed its flu data aggre­gator to pro­vide real-​​time mon­i­toring of influenza cases world­wide based on Google searches that matched terms for flu-​​related activity. Despite some suc­cess, GFT has over­es­ti­mated peak flu cases in the U.S. over the past two years. GFT over­es­ti­mated the preva­lence of flu in the 2012–2013 season, as well as the actual levels of flu in 2011–2012, by more than 50 per­cent, according to the researchers, who hail from the Uni­ver­sity of Houston, North­eastern Uni­ver­sity and Har­vard Uni­ver­sity. Addi­tion­ally, from August 2011 to Sep­tember 2013, GFT over-​​predicted the preva­lence of flu in 100 out of 108 weeks.

Read the article at Scientific American →