In the past, a scholar would have to spend years of intense researching in order to assemble a broad humanities-​​based assess­ment of a topic like the role of race in 19th-​​century literature.

That would require reading for years,” said Ryan Cordell, a new assis­tant pro­fessor of Eng­lish in the Col­lege of Social Sci­ences and Human­i­ties at North­eastern. “And after all that time, he or she would have read 0.0001 per­cent of what was written in that era. There are limits of what you can phys­i­cally read.”

Enter the emerging field of dig­ital human­i­ties, which applies com­puter and network-​​science tech­niques to dig­i­tized texts, like the mas­sive vol­umes of lit­er­a­ture that have been scanned and stored over the past two decades.

The Internet Archive has scanned more than 2 mil­lion public-​​domain books span­ning 500 years, so we can see how lan­guage, words and syntax change over time — or look at any broad trend that exists,” said David Smith, a new assis­tant pro­fessor in the Col­lege of Com­puter and Infor­ma­tion Sci­ence. He was pre­vi­ously a research assis­tant pro­fessor at the Uni­ver­sity of Massachusetts-​​Amherst and in 2010 received a Ph.D. from Johns Hop­kins University.

Smith and Cordell are among the fac­ulty mem­bers founding Northeastern’s new Cen­ters for Dig­ital Human­i­ties and Com­pu­ta­tional Social Sci­ence, an inter­dis­ci­pli­nary base for researchers from schools including the Col­lege of Com­puter and Infor­ma­tion Sci­ence, the Col­lege of Social Sci­ences and Human­i­ties and the Col­lege of Sci­ence.

By turning these archives into data, we can make quan­ti­ta­tive and replica­tive analysis,” said Smith, such as looking at how infor­ma­tion spreads through a society over time or looking at lit­er­a­ture to examine issues like social mobility during a par­tic­ular era.

Cordell, who received his Ph.D. from the Uni­ver­sity of Vir­ginia in 2010, enters the field from a human­i­ties per­spec­tive: While working on his dis­ser­ta­tion, he began to track the (usu­ally uncred­ited) spread of a piece by Nathaniel Hawthorne through news­pa­pers and pub­li­ca­tions across the United States. Hawthorne him­self used the term “pirating” before its per­va­sive use to describe his work’s spread, and Cordell was curious if that same phe­nom­enon existed with other publications.

If you don’t know what is going to be reprinted, you’re left com­paring every­thing to every­thing else,” said Smith, who explained how digital-​​humanities methods allow researchers to turn text into search­able data, which can be orga­nized and assessed with network-​​science tech­niques. “What you ulti­mately get are net­work maps that let us the­o­rize how these pub­li­ca­tions were talking to one another and explain how this infor­ma­tion spread.”

Both Cordell and Smith will be teaching courses for under­grad­u­ates and grad­u­ates this fall: Smith a course on infor­ma­tion retrieval, and Cordell one on tech­nolo­gies of text, which he jokes covers “a his­tory of reading from the scroll to the scroll.”