Fuzzy Set and Cache-based Approach for Bug Triaging
Ahmed Tamrawi, Tung Thanh Nguyen, Jafar Al-Kofahi, Tien N. Nguyen

Bugzie Model Overview

In Bugzie, the problem of automatic bug triaging is modeled as follows: given a bug report, find the developer(s) with the most fixing capability/expertise with respect to the reported technical issue(s).

Existing approaches view this problem as a classification problem: each developer is considered as a class for bug reports in which their characteristics are learned via his/her past fixed reports. An unfixed bug report will be assigned to the developer(s) corresponding to the most relevant/similar class(es) to the report.

In contrast, Bugzie considers this as a ranking problem: for each given bug report, it determines a ranked list of developers who are most capable of handling the reported technical issue(s). Thus, instead of learning the characteristics of each class/developer based on his past fixed reports, Bugzie determines and ranks the fixing capability/expertise of the developers toward the technical aspects by modeling the correlation/association of a developer and a technical aspect. That is, if a developer has higher fixing correlation with a technical aspect, (s)he is considered to have higher capability/ expertise on that aspect, and (s)he will be ranked higher.

Because “technical aspect” is an abstract concept, with potential different levels of granularity, Bugzie models them via their corresponding descriptive technical terms. That is, a technical aspect is considered as a collection of technical terms that are extracted directly from the software artifacts in a project, and more specifically from its bug reports. Bugzie utilizes the fuzzy set theory to model the fixing correlation/association between developers and the technical terms/aspects, which is used to recommend the most capable fixers for a given bug report. Bugzie also uses the locality of fixing activity to select the fixer candidates, and uses the levels of correlation between the fixers and terms to identify the most correlated/important terms for each developer.