Project Proposal: For Internship at D-Lab, School of Information Technology, on 15 May to 15 Augugst, 2017
A Binary Classification Model for Predicting Habitual Television Viewer Enjoyment
(Officially known as “Sentiment Analysis for the Elderly”)
1. Mr.Boyi Li , Engineering Science at University of Toronto
2.Philip Lu, Engineering Science at University of Toronto
3.Ms.Saila Shama, Engineering Science at University of Toronto
In Thailand’s aging society, there is a growing need to address the needs and daily activities of the elderly. This project aims to propose an algorithm-based approach to modelling and predicting enjoyment in the activity of television watching, a habitual activity of the elderly population. By combining existing datasets of facial and postural data, a binary classification, supervised learning model will be developed that predicts a viewer’s enjoyment of television content. This model would have significant applications in improving content delivery and the physiological health, mental engagement and long-term well-being of habitual elderly viewers. It is generally agreed that physical activity supersedes sedentary activities such as television watching with regards to physical health (Szanton et. al, 2015) as well as social connectedness (Toepel, 2013). Nevertheless, developing a deeper understanding of our emotional responses when watching television can be a first step to improving the quality of life of the elderly.
Machine learning is the use of computational techniques and statistics to learn the underlying probability distributions in real-world scenarios. Supervised learning algorithms are the most common and successful branch of machine learning. In supervised learning, the goal is to create an algorithm that can associ.ate a given input x with an output y. Deep learning, more specifically, is designed to overcome challenges brought forward by high-dimensional data such as images. It has been used extensively in facial recognition tasks and has proven effective in generating feature-reduced representations of facial data (Sun, 2014). Combined representations using both facial and postural data, however, have rarely been put forward in existing deep learning literature.
The model predicts the elderly viewer’s enjoyment of television content based on their facial expressions and gestures they generate while they are watching television. The facial expression of elders will produce two classes of sentiments, positive feeling and negative feeling. Positive feelings comprise happiness, surprise, and interest. Fear, anger, disgust, and sadness are defined as negative feelings. These definitions are drawn from a collection of studies on universal emotional expressions (Duclos et. al, 1989). The postures, such as lying down, standing, seated, are also analysed to enhance the accuracy of emotion recognition, since the gestures could directly reveal psychological activities and mental engagement of viewers. The candidate models are then trained with the known, reported emotion of the sample subjects in response to different videos. K-fold cross-validation is performed to select the model that will have the best performance on the test set. Different genres of television content can be included as features to make the model more comprehensive. The output of the model on new input data can be compared with the known emotional response to test the algorithm’s performance.
1. Construct a labeled dataset with combined features of facial and postural attributes from individuals watching video or television
2. Perform k-folds cross-validation using different machine learning classification models to determine the best performing model on the validation sets
3. Run the chosen model on a pre-determined test set with known outputs and determine the generalization error
4. Consider improvements and applications of the results such as controlling the genre of TV content and targeting more specifically towards elderly subjects
Team Members and Educational Background
Philip Lu is going into his third year of undergraduate study at the University of Toronto. He will be majoring in the Robotics option in the Engineering Science program.
Saila Shama is an exchange student at KMUTT from University of Toronto. She completed her second year of studies in Engineering Science before coming to Thailand for the summer. She will be majoring in the Electrical and Computer Engineering option as her specialization.
Boyi Li is going into his fourth year of undergraduate study at the University of Toronto. He is majoring in the Mathematics, Statistics, and Finance option in the Engineering Science program.
Software tools and libraries that will be used include Python Numpy, OpenCV, MATLAB, and high performance computing (HPC).
Philip Lu is primarily responsible for constructing the dataset to be used for the machine learning model. This involves collecting facial data and combining it with the postural data from previous works of the D-Lab. If there is missing or incompatible data, experiments may need to be designed to collect new data from human subjects. Philip would thereby also be responsible for acquiring additional budget resources such as time and volunteers as well as ensuring ethical standards are met.
Saila will use OpenCV and high performance computing resources to implement the deep learning algorithms and obtain the parameters for emotion recognition. Specifically, she will apply the Convolutional Neural Network (CNN) technique for facial expression recognition.
Boyi Li will be responsible for validation of the different models. Diagnostic test will be applied to each model to improve the accuracy of the results and the performance of the algorithm. By analysing the results, specific suggestions would be provided to data collectors to make the data more compatible.
Duclos, S. E., Laird, J. D., Schneider, E., Sexter, M., Stern, L., Van Lighten, 0. (1989). Emotion-Specific
Effects of Facial Expressions and Postures on Emotional Experience. Journal of Personality and Social Psychology. 57 (1). 100-108.
Sun, Y., Chen, Y., Wang, X., Tang, X. (2014). Deep learning face representation by joint
identification-verification. Advances in Neural Information Processing Systems 27 (NIPS 2014). Szanton, S., Walker, R., Roberts, L., Thorpe, R., Wolff, J., & Agree, E. et al. (2015). Older adults’
favorite activities are resoundingly active: Findings from the NHATS study. Geriatric Nursing,
36(2), 131-135. http://dx.doi.org/10.1016/j .gerinurse.2014.12.008
Toepoel, V. (2012). Ageing, Leisure, and Social Connectedness: How could Leisure Help Reduce Social
Isolation of Older People?. Social Indicators Research, 113(1), 355-372. http://dx.doi.org/10.1007 /sl 1205-012-0097-6