American Sign Language Interpret using Web Camera and Deep Learning

Fernandez, Kurt Christian; Paredes, Quintin Brian; Perfecto, Janiño; DE GOMA, JOEL

doi:10.46254/EU05.20220089

Track: Machine Learning

Abstract

The paper is about utilizing the deep learning models partiuclarly the 3dCNN and LSTM together with OpenCV for webcam functionalities and Google's MediaPipe Framework to develop the American Sign Language Interpreter System. The objective of this study is as follows: First, is to test the proponents' own dataset on Tran's network model (Tran et al 2020) and gather experimental results for comparison. Second, The proponents did a modification of Tran's model network by adding an LSTM layer to accommodate the temporal structure of the proponents' dataset. Overall, the metrics of Training Accuracy and F1-Score were used as the basis for the performance of each model network ?Tran's network model performed well by achieving an 81.6% accuracy during training and 81.8% on the F1-Score in real-time. In comparison, the proponent model achieved an 89.9% accuracy on training and 89.8% in real-time. The most notable difference is during real-time, as the proponents' model classified the gestures more correctly. It uses the sequence prediction that is made possible by the LSTM layer.

Keywords
Deep Learning, Object Detection, MediaPipe, Computer Vision, OpenCV, 3dCNN, LSTM

American Sign Language Interpret using Web Camera and Deep Learning

Kurt Christian Fernandez, Quintin Brian Paredes, Janiño Perfecto & JOEL DE GOMA

Publisher: IEOM Society International

Track: Machine Learning

Abstract

Related Research