As part of HUBweek, this Wednesday, September 28, BARI co-hosted a hackathon on “Data Science, Journalism, and the Future of Justice” with InkHouse and Northeastern University’s School of Journalism and College of Computer and Information Science. It was the inaugural HUBweek hackathon.
BARI co-director Dan O’Brien participated on a panel with Todd Wallack, of the Boston Globe’s Spotlight team, and Michelle Borkin, assistant professor in the College of Computer and Information Science (CCIC) at Northeastern. Also present were John Wihbey, assistant professor of journalism and new media at Northeastern, who served as emcee, as well as Randall Lane, editor of Forbes magazine and Igor Tulchinsky, the founder and CEO of WorldQuant, who kicked off the evening with an opening conversation about data and business.
More than 60 attendees participated in the two-hour hackathon in the Curry Student Center on Northeastern’s campus, working with three different administrative data sets from public sources describing crime and justice in Boston–40 years of homicide reports, crime reports provided by the City of Boston, and Field Interrogation & Observation records from the Boston Police Department–to explore pressing, emerging trends and deeper issues around fairness, community, safety, and privacy.
Students dove into massive datasets using programs like R or Python, looking for trends that uncovered new meaning and sparked further research. Working independently or in small groups, they focused on specific aspects and findings within the data that peaked their interest. The winner was Ben Towne, a PhD candidate at Carnegie Mellon University in Computation, Organizations, and Society, who presented insightful analyses revealing that a very small number of police officers are responsible for the lion’s share of stops.
Frank Dutan of The Groundtruth Project described the projects undertaken by a few groups of students:
A group of graduate journalism students […] wanted to know if the data shows a relationship relevant to the ongoing protests against police brutality and accusations of profiling in the U.S. […] They found that 17 percent of stop and frisk incidents involved a person wearing a hooded sweatshirt, a politically-affiliated piece of clothing or something similar. They initially wanted to create a calculator that determined the probability of being stopped and frisked based on the clothes the person is wearing.
Some even wrote their own programs, which could do more heavy lifting. During one of the presentations, a first year computer science student, Jack Michaud, said that he wrote a program that could look for correlations between monthly crime rates stop and frisk incidents. He found that crime and stop and frisk both peaked in May. The probable cause, according to his research? The Bruins elimination from the playoffs.
As Northeastern News reporter Bill Ibelle said,
But given the two-hour time limit, the point of the exercise was the process, not the results. “This was an opportunity for data geeks to meet one another, make contacts, and be energized by working in a room full of people who share their interests,” said O’Brien, who counts himself as a data geek as well. “We want to encourage a collaboration between disciplines and agencies. Our goal is to get people excited about using Big Data to gain a deeper understanding of how cities work.”