Computer Science students present data analysis system for Medicare data

Computer Science students present data analysis system for Medicare data

By: Dr. Ian Gorton, Director of Computer Science, Northeastern University-Seattle

On Thursday December 10th, six students in the CS8647 Project Class, Timothy Cowley, Brian Gillespie, Zheyu Jin, Hunter Jorgensen, Doyle Ravnaas, Joshua Shaham, gave a demonstration and description of a data analysis system they had built for publically available Medicare Part B data. This data set was recommended by the NIST Big Data Reference Architecture group for exploring the efficacy of their initial reference architecture for Big Data systems. To this end, the students analyzed the data, devised a set of queries and analyses they could perform on the data, and build the Web-based analysis system using state-of-the-art big data technologies hosted on the Amazon Cloud.

The data set used in the class provided summarized data about the procedures and costs performed by Medicare providers in the USA in 2012. Given this data, there were many questions that could be asked, and the students chose to implement several, including:

  • For a particular medical procedure, list the most/least expensive providers in a given state:
  • Given a charge for a specific procedure that a patient had paid, how does this compare nationally or within  the state with charges from other providers for the same procedure?
  • Based on provider Zip Codes only, infer if a particular prover is based in a rural or urban environment

The data set contained 10’s of millions or records on providers and procedures, and this was loaded into a Cassandra database, which acted as the primary data store. The data was also indexed using Solr to provide richer query capabilities than those provided by Cassandra. Apache Spark was used to analyze the data sets using machine learning techniques to predict, for example, if a given provider was located in an urban or rural setting. To provide Web-based query capabilities, the Spring framework hosted the query business logic, and the presentation layer was implement in a Web browser using Javascript and the D3 visualization library.

Master's Project Slide-2Building this application was challenging due to the complexity of the technologies that were used, and the inherent difficulty of integrating the software into a scalable, reliable system architecture. The resulting design reflects many of the attributes of NIST’s Reference Architecture, and we are writing a report for NIST that provides detailed feedback on the classes’ experiences.

The class also gave the students the opportunity to put into practice the skills they had learned in other courses such as Machine Learning, Wed Development and Databases. The experiences they gained solving real problems has greatly developed their software engineering skills, and wil stand them in good stead when they graduate. The resulting application will also form the basis for subsequent experiential learning classes here at Northeastern University’s Seattle campus.

Connect with Us!