Graph-based Pattern-oriented, Context-sensitive Code Completion

Anh Tuan Nguyen, Tung Thanh Nguyen, Hoan Anh Nguyen, Ahmed Tamrawi, Hung Viet Nguyen, Jafar Al-Kofahi,
Tien N. Nguyen

Video Demo Downloads


Empirical Evaluation

Experiments on Parameter Sensitivity Analysis

We first aimed to measure the accuracy of GraPacc with various values of its parameters. In GraPacc, there are 4 parameters:

  1. δ: the feature similarity threshold (Section 4.4), and
  2. α, β, γ: the name-based similarity weights in comparing the package, class, and method names of two features, respectively.
Accu Sens1
Figure 1. Accuracy with different thresholds on Dom4J

In our first experiment, we ran GraPacc on the subject system Dom4J with 127 source files, 565 test methods that use Java Util, and the total of 660 recommended API usages with 1,637 API elements. We fixed α = β = γ = 1/3, varied δ from 0.3 to 1 with step of 0.1, and measured precision, recall, and f-score at those values of δ. Figure 1 shows the result. As shown, precision ranges from 80% to 93%, recall from 70% to 79%, and f-score from 75% to 85%. In general, f-score is at a very high level and is quite stable. As seen, when the name-based feature similarity δ increases, f-score also slightly increases and peaks at δ = 0.9. This is reasonable because high threshold on name-based feature similarity helps GraPacc avoid incorrect matching between types, variables, and other API elements from a query to the patterns, i.e. improves precision. However, strict matching, i.e. δ = 1, reduces the recall, since it allows only identical names to be matched, thus, reduces the numbers of the matched features and the pattern candidates.

In our next experiment, we ran GraPacc on Dom4J with various weighting values for the parameters α, β, and γ, respectively. The higher the values of α, β, and γ are, the more weights are given to the similarity of package, class, and method names, respectively, in feature comparison between a query and a pattern. Since the sum of all three parameters must be equal to 1, we varied only β and γ, each from 0 to 1 with the increasing step of 0.1. We fixed δ at 0.5 to allow α , β, and γ to have more impact on the accuracy.

Table 1 shows the f-score values with various values of β and γ. Each row corresponds to the change of f-score with respect to the change of γ when β is fixed at a value from 0 to 1. As seen, when β is zero (disregarding the similarity between class names), f-score values are lower than those in other cases. However, if β is non-zero, when γ increases, i.e. more weights are put to the method's name similarity, f-score also increases. Moreover, as seen in each column (i.e. when γ is fixed), when (non-zero) β is varied from 0.1 to 1, f-score does not change much. Importantly, f-score achieves the highest point of 81%, when γ is 0.6 and β = α = 0.2. In addition, the region of maximum f-score values is around the values of 0.5-0.8 for γ, and those of 0.1-0.5 for β.

Thus, this result shows that putting a higher weight for method names similarity than those for package and class names similarity yields higher accuracy. However, we cannot discard the package and class names (i.e. α and β) when comparing the features.