Approximately 63 million individuals in India have significant hearing and speech impairments. This creates substantial communication barriers that restrict social, educational, and occupational inclusion. Current solutions rarely support bidirectional, real-time interaction tailored for Indian Sign Language (ISL). In this work, we introduce I-SRAVIA (Indian Sign-Language Responsive and Voice Intelligent Assistant), a computer vision-driven prototype enabling two-way communication between ISL users and hearing individuals.
The system employs a dataset of 1,149 ISL gesture images across nine classes captured by a webcam. The image dataset was augmented to 14,937 samples by treating the real dataset with various parameters. The feed-forward Multi-Layer Perceptron (MLP) sequential model specialized for Convolutional Neural Network (CNN) to classify images into nine ISL words for the model generation that. was trained over 25 epochs (batch size 32). Software evaluations—comprising training/validation accuracy, loss metrics, and confusion matrix analysis—demonstrated near-perfect performance with zero misclassification among five tested gestures. Real-time trials involving 160 gesture inputs produced a Mean Magnitude of Relative Error (MMRE) of 15.6%, equivalent to 84.4% prediction accuracy. Orientation robustness tests confirmed reliable gesture recognition within ±25° deviations. The user interface was developed using Flask, HTML/CSS, and JavaScript using principles of human factors engineering. The interface supports both gesture-to-text/voice and voice-to-text modes.
These findings demonstrate the feasibility of a reliable, two-way ISL-based communication platform. Future work would expand the gesture lexicon library, leverage enhanced computational resources, and conduct usability testing in real-world environments to take this work toward real case deployment.