Multi-layered Approach for Recovering Links between
Bug Reports and Fixes

Anh Tuan Nguyen, Tung Thanh Nguyen, Hoan Anh Nguyen, Tien N. Nguyen




MLink Approach
Figure 1. MLink Approach

We developed MLink, a multi-layered approach to automatically recover bug-to-fix links. Given the history of bug records in a bug-tracking repository and that of commits (commit logs and changed source files) in a version repository, it will recover the links between the already-fixed bug reports and the corresponding fixing commits (i.e. fixes). MLink extracts and makes use of not only textual features in bug records (summary, description, and bug comments) and in commit logs, and meta-data as in existing bug-to-fix link recovery approaches , but also code features in the associated information of the bug records and commits.

MLink recovers links in cascading layers in which each layer is a detector with its own set of textual and code features (Figure 1). The input of each layer is the remaining candidate links that the previous layers could not confirm/detect. Its remaining candidate links are passed into the next layer, with the expectation that some additional links will be revealed via features used in the next layer. The detected links are combined into the final link set from all layers. The detectors/layers having features with higher levels of confidence on accurate detection are applied at earlier stages.

MLink Architectural Overview.

Figure 1 displays the process in MLink to recover bug-to-fix links. Bug records in the bug-tracking database and the commits with their associated logs and changed code are first analyzed by the feature extractor. The features will be fed into the appropriate detectors at different layers, e.g., the time features are used in the filtering layer, while recommended patch code features, names of program entities and system components, text features, and term features are provided into the patch-based, name-based, text-based, and term/code association-based link detectors, respectively. Let us explain them.

  1. Pattern-based detector. It works based on the following:

    a) the notes/hints that the fixers provided in the commit logs about the issues/bugs for which their fixing changes were intended. The typical patterns/phrases include "fix the issue #...", "fix the bug ID...", etc.

    b) the notes the fixers/commenters left in a bug record to refer to the fixing revision(s) for that bug report. Common patterns/phrases are "fixed by r123", "fixed in r123", etc.

  2. Filtering layer. The remaining candidate links that were not detected by the pattern-based detector are analyzed and those violating a time constraint will be removed.

  3. Patch-based detector. This layer first extracts the patch code recommended by the bug reporters or commenters (if any) that is embedded within bug descriptions/comments. It then matches them against the changed source code (i.e. the changed code portions) of the commit under testing.

  4. Name-based detector. In some cases, the recommended patches are not available. However, the names of entities and system components would be mentioned in both bug records (descriptions, summaries, comments) and fixes (commit logs, changed code, and inline comments).

  5. Text-based detector. After detecting links via the layers with code-based features, MLink compares bug records with fixes via textual features extracted from bug descriptions/summaries/comments and from commits (with commit logs and inline comments in the changed code).

  6. Association-based detector. This detector is used where the texts or entity names in bug records and commits are not similar. MLink computes the association strengths between the terms in bug reports and the entity names in commits from the link set detected by the pattern-based detector, and then infers the links. That is, the commit with some entity names was likely the fix for a bug record(s) having their associated terms or vice versa.