The interdisciplinary Graduate Certificate in Data Analytics is offered through a collaboration between the College of Computer and Information Sciences and the College of Social Sciences and Humanities. The certificate curriculum emphasizes the skills needed to bridge between emerging technological capacities and traditional policy-making processes. The program is designed to provide students with foundational knowledge in data science, including data management, machine learning, data mining, statistics, and visualizing and communicating data, that can be applied to data driven decision-making in any discipline.



The Graduate Certificate in Data Analytics is offered in a 100-percent online format, enabling students from across the globe to take advantage of the program’s challenging classes and expert faculty. The flexible format is designed to allow students to take classes on their own schedule while interacting with classmates through discussion boards and other technology. Classes benefit from Northeastern’s signature experiential learning program, drawing on students’ current professional experiences to make real-world connections with subject matter.



Four 4-credit foundation courses comprise Northeastern’s interdisciplinary Graduate Certificate in Data Analytics. Certificate course credits are transferable to master’s degree programs included in the university’s Data Science Programs.


Foundation Courses

Introduction to Computational Statistics

This course provides an introduction to the fundamental techniques for quantitative data analysis, with an emphasis on large or complex data sets. It lays the foundation for many of the other courses in the Data Science Programs, including topics such as data acquisition and management, scripting and sampling, probability and statistical tests, econometric models, and data visualization. These diverse skills are developed using the R statistics language and data sets that emphasize real-world data problems. The course begins with a review of probability and statistics then progresses to data manipulation, sampling, and scripting; statistical tests; OLS regression; categorical dependent variables; maximum likelihood methods; time series; and hierarchical models. The course finishes with a brief introduction to machine learning methods and visualization using R. Throughout, there will be an emphasis on the challenges and limitations of modeling big data, and students will finish with the basic skills needed to manipulate and model complex data and present their insights to non-experts.


Collecting, Storing, and Retrieving Data

Students learn how to build large-scale information repositories for different types of information objects so that later these data can be selected, retrieved, and transformed for analytics and discovery. Students will learn how traditional approaches to data storage can be applied alongside modern approaches that use massively parallel computation and non-relational data structures. Through case studies, readings on background theory, and hands-on experimentation, students will learn how to select, plan, and implement storage, search, and retrieval components of large-scale, structured and unstructured information repositories. In particular, students will be able to assess and recommend efficient and effective large-scale information storage and retrieval components that provide data scientists with properly structured, accurate, and reliable access to information needed for investigation.

Introduction to Data Mining/Machine Learning

This course provides an introduction to the fundamental techniques for data mining and covers several basic learning algorithms, along with popular data kinds, implementation and execution, and analysis of results. It teaches students how learning models from data work, both algorithmically and practically. Coding can be done in R, Matlab, or Python, but a demonstrated ability to set up and run learning algorithms on various data sets, test models on new data, choose appropriate techniques for particular datasets or tasks, and evaluate and present results to non-experts.


Information Design and Visual Analytics

This course introduces the systematic use of visualization techniques for supporting the discovery of new information as well as the effective presentation of known facts. Based on principles from art, graphic design, perceptual psychology, and rhetoric, students will learn how to successfully choose appropriate visual languages for representing various kinds of data in order to support insights relevant to the user’s goals.

Topics covered in this course include: visual data mining techniques and algorithms for supporting the knowledge discovery process, principles of visual perception and color theory for revealing patterns in data, semiotics and the epistemology of visual representation, narrative strategies for communicating and presenting information and evidence, and the critical evaluation and critique of data visualizations.