CS Seminar: Using data to predict students at-risk of failure

CS Seminar: Using data to predict students at-risk of failure

Over half a million students fail to graduate from high school every year. In higher education, similar issues of retention arise, especially for STEM students. Experienced educators can pinpoint students at risk of failure, but the solution doesn’t scale well, cannot be used to rank students with the highest risk, and is open to personal biases. Better solutions are needed.

Dr. Everaldo Aguiar‘s PhD research looked out how to use machine learning, based on large amounts of historical data collected by schools, to see if at risk students could be identified. In the recent Computer Science Seminar held May 19 at Northeastern University–Seattle, Dr. Aguiar presented the development, deployment and evaluation of machine learning models that detect, ahead of time, students at risk of underachieving their academic goals.

View the slides:

Working with a large high school district with 150K+ students, he collected data from 6th to 12th grade. 12.4% of students within the data set did not graduate on time.

The district collected academic performance data, behavior data (absence, suspension, etc), mobility (moved from other districts), which was used to train a machine learning model to identify a ranked list of who was most likely to drop out.

The model that was developed was able to predict with over 70% accuracy in identifying at risk students, which was twice as good as the school’s existing early warning system. Based on these predictions, an interactive dashboard was created that showed each student’s risk of failure, and showed the educators the influencing factors behind the ranking, like high absence rates.

In higher education, Aguiar said, 50% of STEM bachelor degree candidates drop out or change majors. By using a survey based on student interest in STEM and correlating this with their grades, it was possible to predict who might want to switch majors.

Interestingly, academic performance alone was a poor predictor of dropping out of STEM and it was important to include other factors to predict accurately. So in addition, data was gathered on student engagement, based on their activity on a university-provided social media website. It turned out that the engagement measures were the most important predictor of a student’s performance in STEM, and when this was utilized in the model, predictions of students who will drop out was raised to around 90% accuracy.

– Dr. Ian Gorton, Director of Computer Science, Northeaster University–Seattle

Everaldo Aguiar earned his PhD in Computer Science and Engineering from the University of Notre Dame in 2015. He now lives in Seattle working as a data scientist at Concur. Starting this September, he will join the adjunct computer science faculty at Northeastern University-Seattle teaching the graduate Data Mining course.

Learn more about Northeastern’s Master of Science in Computer Science and our innovative ALIGN Computer Science program, designed to provide students from non-technical backgrounds with an opportunity to become world-class computer scientists.

The Computer Science Seminar Series showcases leading experts to discuss a range of top-of-mind topics. Curated by Director of Computer Science Dr. Ian Gorton, the talks provide a great opportunity for students and industry to gather in the classroom and network at Northeastern University-Seattle’s campus in South Lake Union. We are taking a short break over the summer and will resume these monthly events in September. Sign up for the NU-Seattle newsletter to find out when: bit.ly/NUSeattleNewsletter

Connect with Us!