Graph-based Pattern-oriented, Context-sensitive Code Completion

Anh Tuan Nguyen, Tung Thanh Nguyen, Hoan Anh Nguyen, Ahmed Tamrawi, Hung Viet Nguyen, Jafar Al-Kofahi,
Tien N. Nguyen

Video Demo Downloads

Empirical Evaluation

Accuracy in Code Completion


1. Data collection:

We collected a total of 28 open-source Java projects in different domains that use Java Utility library. We then used our prior pattern mining tool, GrouMiner, to collect all possible API usage patterns of Java Utility library from a set of 4 Java projects, which were used as the tool's knowledge (Table 1). The other projects were used for evaluation.

Training systems
Table 1. Training data for Java Utility Patterns


2. Experiment procedure:

For each of those projects under testing, our evaluation tool collected all of its methods that use Java Utility APIs. For such a method, we simulated a real programming situation. We assumed that a developer partially finished his/her coding in that method and requested the help from GraPacc to complete the remaining of that method. Thus, we divided the code of the method under testing (called a test method) into two parts: the first part was used as a query, and the second part was used as the ground truth for comparing and computing the accuracy of GraPacc's code completion.

A method under test might contain multiple Java Utility API patterns. That is, in a real-world situation, a user might need to invoke GraPacc multiple times to get sufficient recommendations to complete the second half of the test method. To correctly simulate that, our evaluation tool iteratively utilized GraPacc for code completion at multiple other focus points (in addition to the first focus point) in the second half of the test method.


3. Evaluation metrics:

The accuracy of code completion is measured by precision, recall, and f-score.

  • Precision is computed as Nc/(Nn + Nc), i.e. the ratio of the number of correctly recommended nodes over the total number of all recommended nodes.
  • Recall is computed as Nc/(Nc + Nm), i.e. the ratio of the number of correctly recommended nodes over the total of all completion-needed nodes.
  • F-score, a measurement that represents a harmonic average of precision and recall, is calculated as: f-score = 2 / (1/precision + 1/recall). Higher f-score signifies better accuracy.