Fuzzy Set and Cache-based Approach for Bug Triaging
Ahmed Tamrawi, Tung Thanh Nguyen, Jafar Al-Kofahi, Tien N. Nguyen

Introduction

For comparison, we used Weka to re-implement existing state-of-the-art approaches following the descriptions in their papers. Cubranic and Murphy use Naive Bayes. Anvik et al. utilize SVM, Naive Bayes, and C4.5’s classifiers. Bhattacharya and Neamtiu use Naive Bayes and Bayesian Network with and without incremental learning. We re-implemented Matter et al.’s vectorspace model (VSM) according to their paper (for comparison, the terms were extracted only from the bug reports).

Because some machine-learning approaches implemented in Weka (e.g. C4.5) can not scale up to the full datasets, we prepared smaller datasets, which have 3-year histories of the full datasets (see Table 7).



Table 8 shows the comparison result in accuracy for the top-1 and top-5 recommendation.



Training and prediction time is given in Table 9.



As seen, Bugzie consistently outperforms other approaches both in term of prediction accuracy and time efficiency for all subjects..