Optimizing software defect prediction: a fusion of binary horse herd optimizer and machine learning methods


Arasteh B., Bouyer A., Gunes P., Ghanbarzadeh R., Gharehchopogh F. S.

Neural Computing and Applications, cilt.37, ss.28295-28331, 2025 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 37
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1007/s00521-025-11669-6
  • Dergi Adı: Neural Computing and Applications
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Applied Science & Technology Source, Biotechnology Research Abstracts, Compendex, Computer & Applied Sciences, Index Islamicus, INSPEC, zbMATH
  • Sayfa Sayıları: ss.28295-28331
  • Anahtar Kelimeler: Feature selection, Horse herd optimization algorithm, Machine learning, Prediction, Software defect
  • İstanbul Gelişim Üniversitesi Adresli: Evet

Özet

Predicting software defects is a vital component of software engineering, focused on improving software quality by detecting potential issues at an early stage. This process involves predicting modules that are likely to be defective before testing, thereby minimizing both the time and expenses associated with the testing phase. Machine learning has proven to be an effective tool in identifying software modules that are prone to defects. However, achieving accurate and precise classification remains a challenge due to the complexity of factors within the training data set. To address this, the selection of the most relevant features for classification becomes essential, a task commonly addressed through the use of metaheuristic algorithms. This paper presents a new approach through the introduction of a binary form of the horse herd optimization algorithm (bHOA), specifically designed to enhance the feature selection process from training data sets. By focusing on the most significant features, the new method aims to significantly improve the precision and accuracy of software defect classifiers. The key contributions of this study include the development of a binary variant of the bHOA for optimized feature selection and the creation of an effective model for classifying faulty software modules. The efficacy of this method was tested using five real-world and benchmark data sets during both the training and testing phases. The results show that among the twenty-one features present in the data sets, metrics such as basic cyclomatic complexity, programme difficulty, the number of operators and operands, and lines of code are the most predictive of software defects. These findings highlight the substantial improvements in accuracy, precision, recall, and F1 score achieved through the integration of the binary bHOA with machine learning methods for software defect prediction.