Graph-based Pattern-oriented, Context-sensitive Code Completion

Anh Tuan Nguyen, Tung Thanh Nguyen, Hoan Anh Nguyen, Ahmed Tamrawi, Hung Viet Nguyen, Jafar Al-Kofahi,
Tien N. Nguyen

Video Demo Downloads

Grapacc Approach

 

GraPacc takes as an input a database of usage patterns and recommends to complete the current code under editing based on those patterns. In GraPacc, API usage patterns include the usages of multiple variables and/or control structures, and are collected and represented as graph-based models. When a user requests to fill in an incomplete fragment of code, GraPacc will extract context-sensitive features from the code, e.g. the API elements on focus and under modification by the developer, and their relations to other elements. The features are then compared to those of patterns to determine and recommend the patterns that are best-fit with the code under editing.

 

DEFINITION 1 (Pattern). An API usage pattern is a set of API elements (i.e. classes/variables/method calls) and control structures (i.e. condition, repetition) with specific control and data dependencies. An API usage pattern specifies a correct usage of such API elements, as intended by the API designer(s), to perform a specific programming task.

Figure 1 (lines 1-2, 9-15) shows an instance of the pattern for creating a GUI window in SWT library.

DEFINITION 2 (Query). A query is a code fragment under editing, i.e. a sequence of textual tokens that are being written in a programming language. A query is generally incomplete (in term of the task that it is intended to achieve) and might not be fully parsable.

SWT query
Figure 2. SWT Query example

 

Figure 2 illustrates a code fragment as a query. The character _ denotes the editing cursor where a developer invokes the code completion tool during programming.

DEFINITION 3 (Groum). A Groum is a graph-based model, in which the nodes represent actions (i.e. method calls), data (i.e. objects/variables), and control points (i.e. branching points of control structures such as if, while, for, etc). The edges represent the control and data dependencies between the nodes. Labels of the nodes are built from the corresponding names of classes/methods/control structures.

Example 1
Figure 3. Graph-based Usage Patterns

 

Figure 3 illustrates the usage patterns discussed in the Motivation part as Groum models.

Because a query might contain the code irrelevant to the patterns under consideration, GraPacc does not directly compare the Groum of the query to those of the patterns. Instead, GraPacc matches the patterns to the code under editing via their features extracted from their Groums and other associated context-sensitive information. Moreover, because a query might be incomplete and/or not fully parsable, the corresponding Groum might not contain all necessary information or might not even be available.

DEFINITION 4 (Feature). A graph-based feature is a sequence of the textual labels of the nodes along a path of a Groum. A token-based feature is a lexical/textual token extracted from a query.

in Figure 3a, there are a graph-based feature of size 1 [Shell.new], a graph-based feature of size 2 [Shell.new, Shell.pack], a graph-based feature of size 3 [Shell.pack, Shell.open, Shell.isDisposed], etc.

The query "for (Iterator _" is incomplete and can not be parsed into an AST. Thus the corresponding Groum is unavailable. However, GraPacc extracts two tokens for and Iterator, and those two tokens could be used to match this query to the patterns that have the usages involving a for loop and an Iterator variable.

DEFINITION 5 (Similarity Measure). Function sim measures the similarity of any two given features f and g based on their textual similarity and the orders of their elements.

DEFINITION 6 (Context-sensitive Weights). Context-sensitive weights measure the significance of the features in a query based on their relations to the focus editing position (user-based factor) and based on the structure of the query's graph-based model (structure-based factor).

DEFINITION 7 (Relevance Measurement). Function fit measures the relevance of a pattern P to a query Q based on the similarity of their features, the context-sensitive weights, and the popularity of pattern P.

DEFINITION 8 (Pattern-based Code Completion). Given a query Q, as an incomplete fragment of code, a code completion tool returns a list of patterns Pi, which are ranked based on their relevance degrees toward Q, i.e. their fit(Pi;Q) values. If a pattern P* is selected, the tool will complete Q with the code corresponding to P* and with the proper replacements of program elements in the current code.