Scientists can predict your age, gender, political party, even income, based on what you tweet

Scientists can predict your age, gender, political party, even income, based on what you tweet

The words we use and the people we follow on social media can reveal a lot more about us than we may think, especially for a computer scientist who knows where to look.

In the first of our Computer Science Seminar Series Thursday, Dr. Svitlana Volkova presented detailed research outlining how social media predictive analytics bring unique opportunities to study people and their behaviors in real time at an unprecedented scale. Through predictive models that assess the language of Twitter users and their followers (referred to as “neighbors”), scientists can determine users’ gender, political affiliations, emotions and even income.

CS Seminar: Social Media Predictive Analytics By analyzing tweets with hashtags like #emo and #happy, researchers were able to classify emotions and positive and negative sentiment of tweets.

“Knowing who the speaker is, is really important,” said Dr. Volkova describing the implications of predictive analytics in regards to personalized recommendations and search, audience profiling, and targeted advertising. “Even if you have private profile, one can learn many things about you from who you follow and who follows you.”

If social media users don’t self-identify to a particular age group or political party, then the words they use in just 5 tweets could reveal these attributes with the right predictive model. Dr. Volkova detailed the various approaches her research team explored when handling dynamic data (100,000 tweets) for predicting latent user demographics, from constrained-resource batch classification, to incremental bootstrapping, and iterative learning via interactive rationale (feature) crowdsourcing.

Language on social media can produce very unstructured data, with misspellings, abbreviations, irony, bot accounts, and user activeness all needing to be considered when building models and determining accuracy of predictive analysis, Dr. Volkova said, warning caution when reading any papers on the subject.

For more about this research check out these additional resources:

CS Seminar: Social Media Predictive Analytics Dr. Volkova discusses her research findings with attendees after a presentation at Northeastern University–Seattle.

Dr. Volkova, currently a research scientist at Pacific Northwest National Laboratory, received her PhD in Computer Science from Johns Hopkins University. Her PhD research focused on building predictive models for socio-linguistic content analysis in social media. She has been mainly working on online models for streaming social media analytics, fine-grained emotion detection and multilingual sentiment analysis, and effective annotation techniques via crowdsourcing incorporated into the active learning framework. She interned at Microsoft Research in 2011, 2012 and 2014 at the Natural Language Processing and Machine Learning and Perception teams.

Join us March 24 for our 2nd Computer Science Seminar!

Software running on mobile systems extensively tracks and leaks users’ personally identifiable information (PII) with traffic handled by third parties leaving users with little visibility and control.

David Choffnes, assistant professor in the College of Computer and Information Science at Northeastern University, will discuss ReCon, a cross-platform system that reveals PII leaks in mobile devices and gives users control over them without requiring any special privileges or custom OSes. Register here to attend: http://bit.ly/CSSeminarReCon

The Computer Science Seminar Series is a monthly speaker event that showcases leading experts in computer science to discuss a range of top-of-mind topics. Curated by Director of Computer Science Dr. Ian Gorton, the series provides a great opportunity for industry to gather in the classroom and network at Northeastern University-Seattle’s campus in South Lake Union.

Connect with Us!