Fuzzy Set and Cache-based Approach for Bug Triaging
Ahmed Tamrawi, Tung Thanh Nguyen, Jafar Al-Kofahi, Tien N. Nguyen

Algorithm

Here, we describe the key algorithm in Bugzie. With two adjustable parameters x (for fixer candidates) and k (for selected term lists), Bugzie operates in three main phases:

  1. Initializing, i.e. building the fuzzy sets for the technical terms collected from the initially available information (e.g. already-fixed bug reports):
    In this phase, Bugzie uses a collection of already-fixed bug reports to build its initial internal data, including:
    1) the fuzzy sets of capable fixers for the available technical terms,
    2) the fixer candidate list F(x),
    3) the individual term lists Td(k), and
    4) the system-wide term list T(k).

  2. Recommending, i.e. producing a ranked list of developers capable of fixing an unfixed bug report.
    In this phase, Bugzie recommends the most capable developers for a given unfixed bug report B.
    1) It extracts all terms from B and keeps only terms belonging to the selected term list T(k).
    2) It computes the membership scores of all developers in the candidate list F(x).
    3) Bugzie ranks those membership scores and recommends the top-n developers as the most capable fixers for the bug(s) reported in B.

  3. Updating, i.e. updating the fuzzy sets as new information is available (i.e. newly fixed bug reports).
    In this phase, Bugzie incrementally updates its internal data with newly available information (i.e. new bug reports are fixed by some developers).
    1) It updates the counting values nd, nt, and nd;t using newly available fixed bug reports by adding new corresponding counts for the new data.
    2) After updating the counting numbers, Bugzie updates the list F(x), Td(k), and T(k). Instead of re-sorting all available developers and terms to update those lists, Bugzie uses a caching strategy: it stores F(x) as a cache (called developer cache). Thus, for each fixed bug report in the updating data, if the fixer does not belong to the cache, Bugzie will add it to the cache, and if the cache is full, it will remove from the cache the developer(s) having the least recent fixing activity. Similarly, Bugzie also stores Td(k) as caches (called term cache), and updates them based on the membership scores. Td(k) is stored as a descendingly sorted list. During updating, if a term t does not belong to the cache and its score t(d) is larger than that of some term currently in the cache, Bugzie will insert it to the cache, and if the cache is full, it will remove the least-scored term.

    This updating and caching strategy makes our incremental updating very efficient. Importantly, it fits well with software evolution nature.