Abstract
The problems faced nowadays are due to the quick advancement of artificial intelligence and machine learning models, giving rise to an alarming trend of deepfake audio scams; posing financial risks, job security threats, and disinformation hazards, as well as political influence. The purpose is to improve voice authenticity and trustworthiness during calls, as this research tackles the critical need for safeguarding communication against deepfake scams, whilst making the public aware of this uprising issue.
The methodologies used in this research focus on the integration of a deepfake detection model into the Jitsi open-source web communication application. The detection model has an audio feature extraction engine (using libraries from python like pyaudio and librossa) that will obtain features like mccf from raw audio data and transform it into a dataframe. It will then be fed into a Machine Learning model. The ML model will be an LSTM, as it is good for sequential data processing and temporal analysis.
This makes it reliable for real time audio monitoring as well as helping in eliminating latency. which will aid in spontaneous identification of deepfake voices.
An awareness game in the front end will be created to educate users the difference between a genuine and a manipulated voice. The results and the implications of this research extend to mitigating the global prevalence of deepfake frauds.
Emphasizing the important role of AI in addressing the evolving deceptive practices and contributing to a safer digital communication environment.