Mining Software Repositories for Defect Categorization

Published online: Mar 23, 2015 Full Text: PDF (1019 KiB) DOI: 10.24138/jcomss.v11i1.115
Cite this paper
Sakthi Kumaresh, Ramachandran Baskaran


Early detection of software defects is very important to decrease the software cost and subsequently increase the software quality. Success of software industries not only depends on gaining knowledge about software defects, but largely reflects from the manner in which information about defect is collected and used. In software industries, individuals at different levels from customers to engineers apply diverse mechanisms to detect the allocation of defects to a particular class. Categorizing bugs based on their characteristics helps the Software Development team take appropriate actions to reduce similar defects that might get reported in future releases. Classification, if performed manually, will consume more time and effort. Human resource having expert testing skills & domain knowledge will be required for labeling the data. Therefore, the need of automatic classification of software defect is high. This work attempts to categorize defects by proposing an algorithm called Software Defect CLustering (SDCL). It aims at mining the existing online bug repositories like Eclipse, Bugzilla and JIRA for analyzing the defect description and its categorization. The proposed algorithm is designed by using text clustering and works with three major modules to find out the class to which the defect should be assigned. Software bug repositories hold software defect data with attributes like defect description, status, defect open and close date. Defect extraction module extracts the defect description from various bug repositories and converts it into unified format for further processing. Unnecessary and irrelevant texts are removed from defect data using data preprocessing module. Finally grouping of defect data into clusters of similar defect is done using clustering technique. The algorithm provides classification accuracy more than 80% in all of the three above mentioned repositories.


Software Defect, Defect Classification, Bug Repository, Clustering, Bug Categorization
Creative Commons License 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.