These days, when people start feeling a fever and a sore throat coming on, often times their first move isn’t to the med­i­cine cab­inet. Instead, it’s to a com­puter or smart­phone to Google their symptoms.

These queries, which make up only a tiny frac­tion of the more than 7 bil­lion total queries the search engine han­dles each day, are all stored by Google. The com­pany uses this data for a variety of rea­sons; it can help Google improve its search results for users—which also boosts the company’s bottom line—and can also ben­efit the pop­u­la­tion as a whole in other ways.

One example of the latter is Google Flu Trends, a sta­tis­tical model devel­oped by engi­neers at Google.org—the company’s foun­da­tional arm—in an effort to “now-​​cast” what’s hap­pening with the flu on any given day.

But research has shown that GFT often misses its target. These results led North­eastern Uni­ver­sity net­work sci­en­tists and their col­leagues to take a closer look at how Big Data should be used to advance sci­en­tific research. Their report was pub­lished online Thursday in the journal Sci­ence.

Big Data have enor­mous sci­en­tific pos­si­bil­i­ties,” said North­eastern pro­fessor David Lazer. “But we have to be aware that most Big Data aren’t designed for sci­en­tific pur­poses.” Fully achieving Big Data’s enthu­si­as­ti­cally lauded poten­tial, he added, requires a syn­thesis of both com­puter sci­ence approaches to data as well as tra­di­tional approaches from the social sciences.

The paper was co-​​authored by Lazer, who holds joint appoint­ments in the Depart­ment of Polit­ical Sci­ence and the Col­lege of Com­puter and Infor­ma­tion Sci­ence; Alessandro Vespig­nani, the Stern­berg Family Dis­tin­guished Uni­ver­sity Pro­fessor of Physics at North­eastern who has joint appoint­ments in the Col­lege of Sci­ence, Bouvé Col­lege of Health Sci­ences, and the Col­lege of Com­puter and Infor­ma­tion Sci­ence; North­eastern vis­iting research pro­fessor of polit­ical sci­ence Ryan Kennedy; and Gary King, a pro­fessor in the Har­vard Uni­ver­sity Depart­ment of Government.

North­eastern net­work sci­ence researchers David Lazer (left) and Alessandro Vespig­nani (right) examine how Big Data can best be uti­lized for sci­en­tific gain in a report pub­lished online on Thursday in the journal Sci­ence. Photos by Brooks Canaday.

In a sense, Google Flu Trends is not bad, but it’s no better than any basic approach to time series pre­dic­tion,” Vespig­nani said. “So the issue is in the claims and the dis­re­gard of other tech­niques or data more than the actual result.”

In their paper, the researchers explain where Google Flu Trends went wrong and examine how the research com­mu­nity can best uti­lize the out­puts of Big Data com­pa­nies as well as how those com­pa­nies should par­tic­i­pate in the research effort.

By incor­po­rating lagged data from the Cen­ters for Dis­ease Con­trol and Pre­ven­tion as well as making a few simple sta­tis­tical tweaks to the model, Lazer said, the GFT engi­neers could have sig­nif­i­cantly improved their results. But in a com­panion report also released Thursday on the Social Sci­ence Research Network—an online repos­i­tory of schol­arly research and related materials—Lazer and his col­leagues show that an updated ver­sion of GFT, which came about in response to a 2013 Nature article revealing GFT’s lim­i­ta­tions, does little better than its predecessor.

While Big Data cer­tainly holds great promise for research, Lazer said, it will only be suc­cessful if the methods and data are made—at least partially—accessible to the com­mu­nity. But that so far has not been the case with Google.

Google wants to con­tribute to sci­ence but at the same time does not follow sci­en­tific praxis and the prin­ci­ples of repro­ducibility and data avail­ability that are cru­cial for progress,” Vespig­nani said. “In other words they want to con­tribute to sci­ence with a black box, which we cannot fully scru­ti­nize and understand.”

If sci­en­tists are to “stand on the shoul­ders of giants,” as the old adage requires for moving knowl­edge for­ward, they will need some help from the giants, Lazer said. Oth­er­wise fail­ures like that with Google Flu Trends will be ram­pant, with the poten­tial to tar­nish our under­standing of any­thing from stock market trends to the spread of disease.