In the past, a scholar would have to spend years of intense researching in order to assemble a broad humanities-based assessment of a topic like the role of race in 19th-century literature.
“That would require reading for years,” said Ryan Cordell, a new assistant professor of English in the College of Social Sciences and Humanities at Northeastern. “And after all that time, he or she would have read 0.0001 percent of what was written in that era. There are limits of what you can physically read.”
Enter the emerging field of digital humanities, which applies computer and network-science techniques to digitized texts, like the massive volumes of literature that have been scanned and stored over the past two decades.
“The Internet Archive has scanned more than 2 million public-domain books spanning 500 years, so we can see how language, words and syntax change over time — or look at any broad trend that exists,” said David Smith, a new assistant professor in the College of Computer and Information Science. He was previously a research assistant professor at the University of Massachusetts-Amherst and in 2010 received a Ph.D. from Johns Hopkins University.
Smith and Cordell are among the faculty members founding Northeastern’s new Centers for Digital Humanities and Computational Social Science, an interdisciplinary base for researchers from schools including the College of Computer and Information Science, the College of Social Sciences and Humanities and the College of Science.
“By turning these archives into data, we can make quantitative and replicative analysis,” said Smith, such as looking at how information spreads through a society over time or looking at literature to examine issues like social mobility during a particular era.
Cordell, who received his Ph.D. from the University of Virginia in 2010, enters the field from a humanities perspective: While working on his dissertation, he began to track the (usually uncredited) spread of a piece by Nathaniel Hawthorne through newspapers and publications across the United States. Hawthorne himself used the term “pirating” before its pervasive use to describe his work’s spread, and Cordell was curious if that same phenomenon existed with other publications.
“If you don’t know what is going to be reprinted, you’re left comparing everything to everything else,” said Smith, who explained how digital-humanities methods allow researchers to turn text into searchable data, which can be organized and assessed with network-science techniques. “What you ultimately get are network maps that let us theorize how these publications were talking to one another and explain how this information spread.”
Both Cordell and Smith will be teaching courses for undergraduates and graduates this fall: Smith a course on information retrieval, and Cordell one on technologies of text, which he jokes covers “a history of reading from the scroll to the scroll.”
– by Matt Collette