Graph-based Pattern-oriented, Context-sensitive Code Completion

Anh Tuan Nguyen, Tung Thanh Nguyen, Hoan Anh Nguyen, Ahmed Tamrawi, Hung Viet Nguyen, Jafar Al-Kofahi,
Tien N. Nguyen

Video Demo Downloads

Empirical Evaluation

Usefulness in Code Completion

 

We conducted a controlled experiment to evaluate how well our code completion tool GraPacc can assist developers in programming, in comparison with the standard tool support, including Google and Google Code Search (GCS) for tutorials and code examples, and Eclipse built-in code completion support. The general key evaluation approach is to have human subjects with relatively equal programming experience perform coding tasks with the code completion support from GraPacc and with the support from the standard tools Google/GCS/Eclipse (GCE), and then to compare their resulting code.

1. Experiment settings:

We prepared 6 different programming tasks (see descriptions). The tasks basically involve standard Java libraries including java.util, java.lang, and java.io.

To reduce bias, the tasks were designed by the second author of our paper. The first author independently mined/prepared the API usage patterns of those libraries from our collected data set without knowing those tasks.

To avoid the situations in which each subject uses an algorithm that does not utilize any of those Java libraries (thus, does not need GraPacc and GCE with those libraries), we designed the tasks such that they could be easily implemented using API elements of those libraries. Each task description includes a short guidance on the algorithm and the general data structures in those libraries, such as array, list, map, set, stack, etc that should be used. The tasks cover several API elements related to many functions of the libraries.

We invited four Ph.D. students majoring in Software Engineering at Iowa State University to complete those tasks. Subjects 1 and 2 have 8-9 years of programming in Java, while subjects 3 and 4 have 5-6 years. All human subjects are familiar with those aforementioned Java libraries. For comparison, subjects 1 and 3 formed Group 1, and the others formed Group 2, making each group have comparably similar mixture of experience. To further avoid the imbalance between two groups, we applied the same crossover technique and required two groups to exchange their roles after each task.

 

2. Evaluation metrics:

We measure the usefulness of the tool support in two dimensions: code quality and developer effort.

  • For code quality, we used the number of programming errors (e.g. bugs). To measure it, we tested the submitted code of the subjects. For each task, via white box testing, we also designed a test suite to test and reveal bugs in the functions of their completed code.
  • Developer effort is generally measured via the amount of actual written code. Thus, the usefulness of a tool could be measured via the ratio of the amount of code provided by tool over the total amount of written code. The higher that ratio is, the more code is actually filled in via the tool support, thus, the more useful the tool is. We used the number of tokens as the metric for the volume of code.