32nd IEEE Conference on Signal Processing and Communications Applications, SIU 2024, Mersin, Türkiye, 15 - 18 Mayıs 2024
There are potential threats to ensuring the accuracy and reliability of voice recordings in forensic science, such as identity theft, spreading misleading information, and manipulation of legal evidence. While the advancement of artificial intelligence technologies increases the production of digital fake documents in forensic medicine and digital forensic science, distinguishing the sounds produced by artificial intelligence from real sounds stands out as a serious problem. In this study, we proposed a system that can distinguish between voices produced by artificial intelligence and real human voices. In the proposed system, based on the voice features obtained using Mel Frequency Implicit Coefficients (MFCC), Convolutional Neural Networks (CNN), Long Short Term Memory (LSTM) and a hybrid model created by combining these two methods are used to distinguish real human voices produced by artificial intelligence. The performance of deep learning-based classification algorithms used was examined. Experiments have shown that the hybrid model outperforms single CNN and LSTM models by classifying sounds more accurately.