Face­book is home to nearly 3 bil­lion photos. Every minute, YouTube grows by another 100 hours of video. And, according to IHS Research, some 30 mil­lion sur­veil­lance cam­eras pepper our public spaces, col­lecting nearly 4 bil­lion hours of footage each week. Need­less to say, there’s a lot of image data that’s ripe for the picking.

Con­tent like this helped break crim­inal cases such as the 2013 Boston Marathon bombing. But if we want to carry on with sim­ilar suc­cesses, we’ll need ever more sophis­ti­cated algo­rithms to parse the data deluge.

For his part, North­eastern Uni­ver­sity assis­tant pro­fessor Ray­mond Fu is working to improve the cur­rent state-​​of-​​the-​​art of bio­met­rics soft­ware, which auto­mat­i­cally dis­tin­guishes between dif­ferent cat­e­gories of people as well as between indi­vid­uals themselves.

Fu’s research recently earned him one of two Young Inves­ti­gator awards from the Inter­na­tional Neural Net­work Society in 2014. “This is a real honor and inspires me to keep up the good work,” said Fu, a machine-​​learning expert who holds joint appoint­ments in the Col­lege of Engi­neering and the Col­lege of Com­puter and Infor­ma­tion Sci­ence.

Backed by funding from Sam­sung Research of America, the research and devel­op­ment arm of the inter­na­tional elec­tronics com­pany, Fu has recently begun devel­oping visual recog­ni­tion soft­ware for use on social media net­works such as Face­book and Twitter.

When people share facial images on social net­works, those images are in the wild. So you have uncon­strained data—meaning it’s not col­lected in the lab under con­trolled con­di­tions,” said Fu. “It can be from mul­tiple cam­eras, mul­tiple resources, so the data has a lot of variables.”

To cir­cum­vent this problem, his algo­rithm ranks the infor­ma­tion in all of the images and quickly tosses out any out­liers. “If some­thing is very dif­ferent from the rest of the images, our algo­rithm can rule it out and mit­i­gate noise,” he explained.

The soft­ware can “learn” a person’s unique face and use that infor­ma­tion to leverage the vast stores of image data online to under­stand society or inform inves­ti­ga­tions. For instance, Fu’s algo­rithms could help iden­tify what types of people turn out at a protest, he said, by rec­og­nizing gen­eral char­ac­ter­is­tics rather than indi­vidual ones: Are the people pho­tographed at demon­stra­tions such as Occupy Wall Street car­rying cam­eras and note­books, and thus likely jour­nal­ists? Are there more uniform-​​donning policemen than protestors?

Of course, adver­tisers and cor­po­ra­tions could also use this data for less-​​noble pur­suits, such as tar­geting their prod­ucts at par­tic­ular groups or indi­vid­uals. “There is always a trade off between pri­vacy and ser­vices,” said Fu. “Every­thing I’m doing uses data that’s pub­li­cally avail­able. We’re trying to pro­vide the best models for ana­lyzing it.”

It’s up to the rest of us—you and me and our representatives—to deter­mine how we should use those models.