Leveraging XLM-RoBERTa with CNN and BiLSTM for Hinglish Toxicity Detection
Abstract
Toxicity in online communication, particularly in code-mixed languages like Hinglish, is a growing concern across social media platforms. Hinglish, a blend of Hindi and English, is widely used in informal online conversations, making it challenging for traditional toxicity detection models to accurately identify harmful content. This issue is compounded by the limited availability of resources and models specifically trained to handle Hinglish. This work presents the XLM-RoBERTa- CNN-BiLSTM (XCB) model, a novel architecture for toxicity detection in Hinglish on various social media platforms. This work compares the XCB model with the SOTA models mBERT, XLM-RoBERTa (XLM-R), and Indic-BERT. It was made on three publicly available datasets: Constraint, Facebook, and HASOC. The XCB model achieved macro F1 scores of 0.81, 0.73, and 0.82 and inference times of 0.24 s, 0.48 s, and 0.22 s on the Constraint, Facebook, and HASOC datasets, respectively. XCB not only outperforms existing romanized Hinglish models but also matches the macro F1 scores of existing SOTA multilingual models, requiring only half the training time—with extremely low inference times unlike the existing state-of-the-art models, thus making it a much more efficient candidate for large-scale real-time toxicity detection in Hinglish.
Keywords
Toxicity Detection, Hinglish, Code-Mixed Language, XCB Model, Real-Time Moderation, Multi-Lingual Models, Efficiency
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
N. Singhal, A. Yadav, A. , G. Singh and R. Kumar, "Leveraging XLM-RoBERTa with CNN and BiLSTM for Hinglish Toxicity Detection," in Journal of Communications Software and Systems, vol. 21, no. 4, pp. 394-403, October 2025, doi: https://doi.org/10.24138/jcomss-2025-0133
@article{singhal2025leveragingroberta,
author = {Nikita Singhal and Avadhesh Yadav and Ankush and Giriraj Singh and Ronak Kumar},
title = {Leveraging XLM-RoBERTa with CNN and BiLSTM for Hinglish Toxicity Detection},
journal = {Journal of Communications Software and Systems},
month = {10},
year = {2025},
volume = {21},
number = {4},
pages = {394--403},
doi = {https://doi.org/10.24138/jcomss-2025-0133},
url = {https://doi.org/https://doi.org/10.24138/jcomss-2025-0133}
}