MemeCLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Memotion Analysis
Abstract
Nowadays, memes, which commonly spread humor, ideas, or even harmful materials such as hate and propaganda, are a significant part of the Internet culture. The meme consists of an image and supporting text. Memotion Analysis, or meme Emotion Analysis, is automatic processing of memes using artificial intelligence. Unimodal solutions are now being taken over by multimodal solutions such as feature concatenation, weighted fusion, and Gated Multimodal Unit(GMU)for better Memotion Analysis. In this work, we proposed two deep learning based multimodal models for meme emotion classification. In the first model, we used ResNet and DeBERTa separately for single image-text fusion. In the second ‘MemeCLIP’ model an integrated CLIP-based representation with GMU employing a gated mechanism for adaptive visual and text feature fusion is used. In contrast to simple concatenation techniques, GMU demonstrates superior capability in extracting fine-grained emo tional cues embedded in Memes. For the Memotion Analysis task 8 of SemEval-2020 competition, the CLIP-based model ‘MemeCLIP’ achieved a F1-score of 0.65, closely followed by the ResNet+DeBERTa model with a score of 0.64, compared to the SemEval baseline of 0.5118. These findings demonstrate the strength of selectively regulating modality contributions.
Keywords
Meme Classification, Memotion Analysis, CLIP, DeBerta, ResNet
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
V. Ganganwar, G. Chauhan, J. Singh, S. Khajuria and V. Battan, "MemeCLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Memotion Analysis," in Journal of Communications Software and Systems, vol. 22, no. 2, pp. 232-243, June 2026, doi: https://doi.org/10.24138/jcomss-2025-0150
@article{ganganwar2026memeclipleveraging,
author = {Vaishali Ganganwar and Gaurav Singh Chauhan and Jangveer Singh and Shashvat Khajuria and Vivek Battan},
title = {MemeCLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Memotion Analysis},
journal = {Journal of Communications Software and Systems},
month = {6},
year = {2026},
volume = {22},
number = {2},
pages = {232--243},
doi = {https://doi.org/10.24138/jcomss-2025-0150},
url = {https://doi.org/https://doi.org/10.24138/jcomss-2025-0150}
}