Dataset: https://www.kaggle.com/datasets/eashankaushik/americansignlanguageactionrekognition
Action recognition is a challenging problem in deep learning. With the advancement in Deep Learning, ConvLSTM (8) and LSTMs have been widely used for action recognition. Both these approaches show promising results; in this project, we will implement an architecture by fusing these two architectures. We will also take advantage of the human skeletonbased approach in action recognition that takes advantage of a compact representation of human action.
American Sign Language (ASL) is widely spread within the deaf community as the primary source of communication with others, being used in 40 countries and containing more than 10,000 phrases. However, only 1% of the population knows sign language. Hence, recognizing American Sign Language (ASL) has various real-world applications. We have implemented a fused ConvLSTM and LSTM architecture in this project to detect 10 ASL signs. We successfully achieved a test accuracy of 0.901 on the custom ASL dataset. To compare our architecture with other Action Rekognition architecture, we have also trained our model on UCF-YouTube Action Dataset, achieving a test accuracy of 0.77. Human pose contains valuable information about ongoing human actions and can be combined with video frames or separately to detect actions. To detect and track human pose, we will make use of MediaPipe.