Exploring A Novel Multi-Channel Structure to Improve Facial Expression Recognition On Occluded Samples Using Deep Convolutional Neural Network
Keywords:
Facial Expression Recognition, Partial Occlusion, Convolutional Neural Network, Transfer Learning, Histogram of Oriented Gradients, Haar WaveletAbstract
The development of Artificial Intelligence (AI) models with an accurate prediction of human facial expression has become a significant challenge for the cases in which masks and sunglasses cover critical facial areas. Given that a substantial portion of human interactions involves non-verbal communication, accurately detecting human emotions such as anger, fear, disgust, happiness, sadness, and surprise would benefit a wide range of applications, from security assessments to psychological treatments. As a workaround, the current study explores the performance of a novel multi-channel arrangement comprised of a Haar-wavelet, Histogram of Oriented Gradients (HOG), and grayscale filters to improve the predictions of deep Convolutional Neural Network (CNN) on occluded results. This study uses the FER-2013 dataset and produces occluded samples by applying a virtual mask that covers almost 55% of facial areas comprising the mouth, lips, and jaw locations. Further investigations, including the impact of each filter, utilizing pre-trained models on occluded samples (transfer learning), and comparison to prior models are also carried out. The proposed approach yields an accuracy rate of 71% for non-occluded and 66% for the occluded samples, which are 6% to 11% higher than the base model. Further transfer learning technique increases the accuracy metrics by 18%, indicating that non-occluded pre-trained models can reveal a broader range of features and their relation, which to some extent compensates for the removed features due to the occlusion. These results suggest the potential capabilities of the proposed technique for similar imaging applications.