Linking with Different Terms in Both Sides
Figure 1. Bug Report #631 and corresponding Fix #1670
Figure 1 shows another bug record (#631) in the ZXing project and its corresponding commit (#1670). The text in the bug record is not similar to that in the commit log or changed source code. Moreover, unlike in the previous example, the entities and components’ names in the changed source files do not appear in the summary, description, or comments of the bug record. In this case, the matching via common patterns, texts, names of program entities and components, or recommended patch code does not work well.
Table 1. Term Occurrences in Reports and Fixes
We further performed a simple text analysis on the entire collection of bug reports and commits for the terms appearing in the bug record #631 and in its corresponding fixing commit #1670 such as decoding, data, DecodedBitStreamParser, upperShift (see Table 1). Each row represents a pair of terms in the bug report and a commit, respectively. For example, we found in the history, there are 46 bug reports containing the word decoding and 10 fixes requiring the modifications containing decodedBitStreamParser. Among those 10 fixes, nine of them have the corresponding bug reports containing the term decoding. Similar explanation is for other terms in Table 1. Thus, if the fixing commits contain the terms DecodedBitStreamParser or upperShift (i.e. the fixes involve those classes/methods), then the corresponding bug records likely have the terms on the left (e.g. decoding, datamatrix). Therefore, if we can learn such association between two terms in two sides from the established/detected bug-to-fix links in the past history, we could infer the links in which the commit involving with some entity names is likely to be the fix of a bug report(s) having their associated terms or vice versa.