Sensing Level of Interest via Machine Learning from Multiple Modalities

Ashish Kapoor, Professor Rosalind Picard


We are interested in combining multiple modalities to detect affect. So far, most of the work in affective computing focuses on only a single channel of information. This work extends earlier work by incorporating information from multiple modalities, concentrating on modalities that can be comfortably sensed from children engaged in a natural learning interaction while seated at a computer.

The initial modalities included were: chair pressure patterns, activity on the computer, and facial activity. While the child could see all the sensors that were present, and the parents were informed about their use, the sensors did not appear to distract the child in any way from the learning task at hand.

The problem was initially posed as a combination of classifiers in a probabilistic framework that naturally explains the concepts of experts and critics. Each channel of data has an expert associated that generates the beliefs about the correct class using only that modality. Probabilistic models of error and the critics, which predict the performance of the individual expert on the current input, are used to combine the experts' beliefs about the correct class.


Latest Results using Mixture of Gaussian Process Classifiers:

We describe a unified approach, based on Gaussian Processes, for achieving sensor fusion under the problematic conditions of missing channels and noisy labels. Under the proposed approach, Gaussian Processes generate separate class labels corresponding to each individual modality.  The final classification is based upon a hidden random variable, which probabilistically combines the sensors.  Given both labeled and test data, the inference on unknown variables, parameters and class labels for the test data is performed using the variational bound and Expectation Propagation.  We apply this method to the challenge of classifying a student's interest level using observations from the face and postures, together with information from the task the students are performing. Classification with the proposed new approach achieves accuracy of over 83%, significantly outperforming the classification using individual modalities and other common classifier combination schemes.


Here is a link to the paper, which will be presented in workshop on Multiple Classifier Systems, 2005.

Earlier Work using HMMs:
We evaluated the multi-sensor classification scheme on the task of detecting the affective state of interest in children trying to solve a puzzle. The sensory information from the face, the postures and the state of the puzzle were combined in a probabilistic framework and we demonstrated that this method achieves a much better recognition accuracy than classification based on individual channels. Further, the critic-driven averaging, which is a special case of the proposed framework, outperforms all the other classic rule-based classifier combination methods applied to this problem.

Here is a link to the ICPR paper and the presentation. Further, you can search on The Affective Computing Publications Page for other related work and the details of the work above. We are continuing to improve upon these early results with new fully Bayesian techniques for inference from multiple modes. Watch for more work in this area on the AC Publications page.

NSF Logo


This material is based upon work supported in part by the National Science Foundation under Grant No. 0087768 and in part by NASA through the Old Dominion University. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.