SOM-US: A Novel Under-Sampling Technique for Handling Class Imbalance Problem

Published online: Jan 30, 2024 Full Text: PDF (1.22 MiB) DOI: 10.24138/jcomss-2023-0133
Cite this paper
Authors:
Ajay Kumar

Abstract

A significant research challenge in data mining and machine learning is class imbalance classification since the majority of real-world datasets are imbalanced. When the dataset is highly unbalanced, the majority of available classification techniques frequently underperform on minority-class cases. This is due to the fact that they disregard the relative distribution of each class in favor of maximizing the overall accuracy. Various techniques based on sampling methods, cost-sensitive learning, and ensemble methods have recently been employed to handle the class imbalance problem. This paper proposes a new clustering-based under-sampling (US) technique, called SOM-US, for handling the class imbalance problem using the self-organized map (SOM). To validate the proposed approach, an experimental study was conducted to improve the capability of a classifier-logistic regression for software defect prediction by applying SOM-US over a NASA software defect dataset. The proposed approach was compared with six existing under-sampling methods on two performance measures. The results demonstrate that the SOM-US significantly improves the prediction capability of logistic regression over other under-sampling techniques for software defect prediction.

Keywords

Class Imbalance, Under-Sampling, Software Defect Prediction
Creative Commons License 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.