A W-Structured Encoder-Decoder Network Combining Swin Transformer and CNN for Image Inpainting Forensics

Mohamed, Mohamed Fathy; ElSayed, Hala Abdel-Galil; Ehssan Aly, Soha Ahmed

doi:https://doi.org/10.24138/jcomss-2025-0247

A W-Structured Encoder-Decoder Network Combining Swin Transformer and CNN for Image Inpainting Forensics

Published online: Mar 31, 2026 Full Text: PDF (6.91 MiB) DOI: https://doi.org/10.24138/jcomss-2025-0247

Cite this paper

Authors:

Mohamed Fathy Mohamed, Hala Abdel-Galil ElSayed, Soha Ahmed Ehssan Aly

Abstract

Image inpainting is rapidly developing currently due to the progress of deep learning techniques and generative models. This has led to a loss of integrity in digital content, where inpainting techniques enable the production of highly realistic altered images that cannot be easily detected. To address these risks, this paper proposes a new deep learning–based image inpainting forensic network called W2SC-Net. The proposed architecture adopts a W-structured encoder-decoder design that integrates the Swin transformers with convolutional neural net works (CNNs). Specifically, the encoder block consists of two par allel streams to effectively extract both local textures and global contextual information. The decoder block is connected with the downsampling stages to enable accurate reconstruction. In addition, a high-pass filtered enhancement block is employed to highlight inpainting artifacts. Extensive experiments demonstrate not only the high detection performance of the proposed model but also its strong generalization capability. Although it was trained only on one inpainting method, it can accurately detect image manipulations across ten inpainting methods and diverse image datasets. Moreover, the W2SC-Net’s robustness against anti-forensics attacks is further improved by introducing an additional training process. Finally, the W2SC-Net outperforms state-of-the-art forensic approaches in terms of F1-score and AUC evaluation metrics.

Keywords

Deep Learning, Image Inpainting Forensics, Inpainting Detection, Convolutional Neural Network CNN, Swin Transformer, Dual-Stream Feature Extraction

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.