Fuzzy Set and Cache-based Approach for Bug Triaging
Ahmed Tamrawi, Tung Thanh Nguyen, Jafar Al-Kofahi, Tien N. Nguyen

Selection of Terms

We conducted a similar experiment for the selection of terms. Bugzie is flexible to allow the selection of only top-k terms that are most correlated with each fixer via their correlation/ membership scores in the ranking process. We ran Bugzie with different values of k, increasing from 1-5,000. With k=5,000 for each developer, the system wide term list T(k) mostly covers all available terms in all bug reports. If a developer has the number of terms less than k, all of his associated terms with non-zero correlation scores are used. For each value of k, we measured top-n prediction accuracy and the total processing time. Figures 6 and 7 show the results of top-1 and top-5 prediction accuracy on all datasets, with different values of k.





 As seen, for all projects (except Apache), the graphs have similar shapes. This exhibits a very interesting phenomenon: accuracy increases and reaches its peak in the range of 3- 20 terms, and when more terms are used, accuracy slightly decreases to a stable level. Thus, selecting a small yet significant set of terms for ranking computation in fact improves prediction accuracy.

More importantly, selecting only a small portion of available terms also significantly improves time efficiency. Figure 8 shows the graph for the total processing time.