…and what about American Idol?
This may seem like a trivial pursuit; indeed Vespignani himself said “we are always working on very serious (sometime gloomy) things like pandemic, deadly viruses, social turmoil. etc. For once we decided to do something more frivolous.” But if these big data analyses that we at Northeastern and elsewhere are so excited about don’t work in the simple cases, they will not work for more complex phenomena like politics, he said.
Also, the Idol voting story is much simpler than that for the US prez. The team deliberately opted for the simplest approach to the problem. “Refinements could be applied to Idol as well as politics,” Fabio Ciulla first author in the study, said. For example, sentiment analysis, demographic corrections, individual users signal, etc. “In addition, for the political arena there are much more data (single political issues, historical data, incumbent candidates) that can be used to calibrate statistical models.”
The team’s fundamental assumption, according to the paper, “is that the number of votes each contestant receives is proportional to the number of tweets that mention her.” They validated this assumption by looking at the Twitter activity during each voting period for the nine episodes leading up to this week’s finale (which happens Tuesday and Wednesday). Indeed, they found that Twitter activity volume (forgetting about the content of the tweets, ie. positive/negative sentiments) is directly correlated with the voting outcomes.
Twitter also allows users to identify their location on the globe. Vespignani’s team can use that info to look at the preferences of various areas around the US. California, for example, seems to love Jessica Sanchez, who is from Chula Vista, CA and Louisiana was partial to Joshua Ledet who hails from the Lake Charles area of that state.
The geolocalized data also revealed an interesting inconsistency: People in the Philippines tweet about Sanchez a whole bunch during the voting period, despite the fact that only US residents are allowed to vote. “Numerous websites explicitly address the issue of ‘voting tunnels’,” says Delia Mocanu, a PhD student at Northeastern studying the geolocalization of Twitter signals. For example, you’d have no problem finding the article “Filipinos in PHL can vote for Jessica Sanchez online using Skype Magic Jack and Vonage” if you Googled “Jessica Sanchez vote from Philippines” (which I just did).
“The anomaly concerning the Philippines (that in principle could not vote) is jumping to the eye,” said author Nicola Perra, who works in social systems characterization. “In politics anomalies would be more subtle to detect but one can hope to see anomalous patterns (such as manipulation of information, fake accounts etc.), using Twitter data.”
The work also calls attention to the fact that publicly available Twitter data can have undesirable consequences in the realms of gambling and social influencing. “For example, the audience could be induced to alter their behavior in function of the situation they observe,” says Andrea Baronchelli another coauthor of the study. “And the job of betting agencies could be dramatically simplified.”
In an interview a few months ago, one of my colleagues asked Vespignani if he didn’t find it a bit unsettling to put such powerful results like these into the public eye. “The data is out there,” he responded. “Others are doing this at the same time without telling the public but we are in academia which means I have to tell the world.”
The paper does not go so far as to make an overt prediction about the American Idol competition. They will save that for Wednesday morning, after Tuesday night’s Twitter conversation. Last week it looked like Sanchez was leading, but what a difference a week…and a Tweet…can make!
Photo via The Express Tribune