A COMPARATIVE STUDY OF PRE-TRAINED CNN ARCHITECTURES FOR DETECTING AI-GENERATED VERSUS HUMAN-CREATED IMAGES

Ayat Abd-Muti Alrawahneh; Siti Norul Huda  Sheikh Abdullah; Amelia Natasya Abdul Wahab; Sarah  Khadijah Taylor; Nik  Rafizal Nik Ab. Rahim

doi:10.22452/mjca.vol1no.1.3

Authors

Ayat Abd-Muti Alrawahneh National University of Malaysia
Siti Norul Huda Sheikh Abdullah National University of Malaysia
Amelia Natasya Abdul Wahab National University of Malaysia
Sarah Khadijah Taylor CyberSecurity Malaysia
Nik Rafizal Nik Ab. Rahim HLA Integrated Sdn Bhd

DOI:

https://doi.org/10.22452/mjca.vol1no.1.3

Keywords:

AI-generated images; deepfake detection; convolutional neural networks; transfer learning; image forensics; model evaluation

Abstract

The widespread use of AI-generated imagery, enabled by advanced generative models, poses increasing challenges to digital content verification and authenticity. This study evaluates the performance of four widely adopted convolutional neural network (CNN) architectures—ResNet50, EfficientNetV2B0, InceptionV3, and VGG16—for classifying images as AI-generated or human-created. A balanced dataset of approximately 80,000 labeled images was used, and all models were trained using a consistent transfer learning pipeline with ImageNet pre-trained weights. Images were resized according to model-specific input dimensions and preprocessed using architecture-appropriate normalization methods. The dataset was split using an 80/10/10 ratio for training, validation, and testing, and each model was trained for eight epochs without data augmentation to focus on baseline performance.

The evaluation was conducted using training and validation accuracy and loss. ResNet50 achieved the highest validation accuracy (97.13%) and the lowest validation loss (0.0861), indicating strong generalization capability. EfficientNetV2B0 followed closely, while InceptionV3 and VGG16 performed slightly lower in both metrics. Visualization of training dynamics, including accuracy and loss curves, showed that all models converged effectively, with ResNet50 demonstrating the most stable and efficient learning trajectory. A final performance comparison chart further highlighted the superior performance of ResNet50 and EfficientNetV2B0. These findings underscore the effectiveness of pre-trained CNN architectures in distinguishing between synthetic and real visual content. The study also establishes a performance baseline for future work in AI-generated image detection, contributing to the broader field of multimedia forensics and trustworthy AI.