A W-Structured Encoder-Decoder Network Combining Swin Transformer and CNN for Image Inpainting Forensics
Abstract
Image inpainting is rapidly developing currently due to the progress of deep learning techniques and generative models. This has led to a loss of integrity in digital content, where inpainting techniques enable the production of highly realistic altered images that cannot be easily detected. To address these risks, this paper proposes a new deep learning–based image inpainting forensic network called W2SC-Net. The proposed architecture adopts a W-structured encoder-decoder design that integrates the Swin transformers with convolutional neural net works (CNNs). Specifically, the encoder block consists of two par allel streams to effectively extract both local textures and global contextual information. The decoder block is connected with the downsampling stages to enable accurate reconstruction. In addition, a high-pass filtered enhancement block is employed to highlight inpainting artifacts. Extensive experiments demonstrate not only the high detection performance of the proposed model but also its strong generalization capability. Although it was trained only on one inpainting method, it can accurately detect image manipulations across ten inpainting methods and diverse image datasets. Moreover, the W2SC-Net’s robustness against anti-forensics attacks is further improved by introducing an additional training process. Finally, the W2SC-Net outperforms state-of-the-art forensic approaches in terms of F1-score and AUC evaluation metrics.
Keywords
Deep Learning, Image Inpainting Forensics, Inpainting Detection, Convolutional Neural Network CNN, Swin Transformer, Dual-Stream Feature Extraction
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
M. Mohamed, H. ElSayed and S. Ehssan Aly, "A W-Structured Encoder-Decoder Network Combining Swin Transformer and CNN for Image Inpainting Forensics," in Journal of Communications Software and Systems, vol. 22, no. 1, pp. 119-130, March 2026, doi: https://doi.org/10.24138/jcomss-2025-0247
@article{mohamed2026structuredencoder,
author = {Mohamed Fathy Mohamed and Hala Abdel-Galil ElSayed and Soha Ahmed Ehssan Aly},
title = {A W-Structured Encoder-Decoder Network Combining Swin Transformer and CNN for Image Inpainting Forensics},
journal = {Journal of Communications Software and Systems},
month = {3},
year = {2026},
volume = {22},
number = {1},
pages = {119--130},
doi = {https://doi.org/10.24138/jcomss-2025-0247},
url = {https://doi.org/https://doi.org/10.24138/jcomss-2025-0247}
}