In order to gain a holistic view of human behavior in the classroom, our research integrates emotion recognition technology and biometric sensors to the university setting. The ability to monitor the visual attention of a learner is a useful feature for Ai in the classroom, so our Human Perception AI algorithms need to be trained with massive amounts of real world data that is collected and annotated to be context specific. Our research proposes a multimodal system that considers realtime data collected from (1) physiological signals from the learner, and (2) classroom information obtained using computer vision to measure engagement in the educational settings. The system relies on conditional generative adversarial networks (GAN) where the models are constrained by the signals previously observed. Emotion and expression metrics measured in an educational setting are different than those needed in automotive, healthcare, or customer service. Leveraging natural language processing to quantify sentiments using text analysis and computational linguistics to identify and extract subjective information from text; computer vision algorithms to detect non-deterministic changes in facial or vocal expressions and deep learning models for more complex problems with higher accuracy; we are creating a data base with massive amounts of data to train our models using deep learning techniques. Our models build on, and extend, the few initial attempts that have been made to develop evolutionary models using a range of emotion metrics utilizing precise emotion classifiers.