Track: Machine Learning
Abstract
The paper is about utilizing the deep learning models partiuclarly the 3dCNN and LSTM together with OpenCV for webcam functionalities and Google's MediaPipe Framework to develop the American Sign Language Interpreter System. The objective of this study is as follows: First, is to test the proponents' own dataset on Tran's network model (Tran et al 2020) and gather experimental results for comparison. Second, The proponents did a modification of Tran's model network by adding an LSTM layer to accommodate the temporal structure of the proponents' dataset. Overall, the metrics of Training Accuracy and F1-Score were used as the basis for the performance of each model network ?Tran's network model performed well by achieving an 81.6% accuracy during training and 81.8% on the F1-Score in real-time. In comparison, the proponent model achieved an 89.9% accuracy on training and 89.8% in real-time. The most notable difference is during real-time, as the proponents' model classified the gestures more correctly. It uses the sequence prediction that is made possible by the LSTM layer.
Keywords
Deep Learning, Object Detection, MediaPipe, Computer Vision, OpenCV, 3dCNN, LSTM