Fuzzy Set and Cache-based Approach for Bug Triaging
Ahmed Tamrawi, Tung Thanh Nguyen, Jafar Al-Kofahi, Tien N. Nguyen

Data Collection

Our datasets contain bug reports, corresponding fixers, and related information (e.g. summary, description, and creation/ fixing time). Table 1 shows our collected datasets of seven projects: FireFox (FF), Eclipse (EC), Apache (AP), Netbeans (NB), FreeDesktop (FD), Gcc (GC), and Jazz (JZ). All bug reports and their data are available and downloaded from the bug tracking systems of the corresponding projects, except that Jazz data is available for us as a grant from IBM Corporation. We collected bug records noted as xed and closed. Duplicate and unresolved (open) bug reports were excluded. Re-open/un-finished bug fixes were not included either.



In Table 1, Column Time shows the time period of the fixed bug reports. Columns Report and Fixer show the number of fixed bug reports and that of the corresponding fixing developers, respectively. For each bug report, we extracted its unique bug ID, the actual fixing developer’s ID, email address, creating and fixing time, summary, and full description. Comments and discussions are excluded. We merged the summary and description of each bug report, extracted their terms and preprocessed them, such as stemming for term normalization and removing grammatical and stop words. Column Term in Table 1 shows the total numbers of terms in all datasets.