GraPacc takes as an input a database of usage patterns and
recommends to complete the current code under editing based on those
patterns. In GraPacc, API usage patterns include the usages of multiple
variables and/or control structures, and are collected and represented
as graph-based models. When a user requests to fill in an incomplete
fragment of code, GraPacc will extract context-sensitive features from
the code, e.g. the API elements on focus and under modification by the
developer, and their relations to other elements. The features are then
compared to those of patterns to determine and recommend the patterns
that are best-fit with the code under editing.
Figure 1. SWT Usage Example 1
DEFINITION 1 (Pattern). An API usage pattern
is a set of API elements (i.e. classes/variables/method calls) and
control structures (i.e. condition, repetition) with specific control
and data dependencies. An API usage pattern specifies a correct usage
of such API elements, as intended by the API designer(s), to perform a
specific programming task.
Figure 1 (lines 1-2, 9-15) shows an instance of the pattern for
creating a GUI window in SWT library.
DEFINITION 2 (Query). A query is a code
fragment under editing, i.e. a sequence of textual tokens that are
being written in a programming language. A query is generally
incomplete (in term of the task that it is intended to achieve) and
might not be fully parsable.
Figure 2. SWT Query example
Figure 2 illustrates a code fragment as a query. The character _
denotes the editing cursor where a developer invokes the code
completion tool during programming.
DEFINITION 3 (Groum). A Groum is a
graph-based model, in which the nodes represent actions (i.e. method
calls), data (i.e. objects/variables), and control points (i.e.
branching points of control structures such as if, while, for, etc).
The edges represent the control and data dependencies between the
nodes. Labels of the nodes are built from the corresponding names of
Figure 3. Graph-based Usage Patterns
Figure 3 illustrates the usage patterns discussed in the
Motivation part as Groum models.
Because a query might contain the code irrelevant to the patterns under
consideration, GraPacc does not directly compare the Groum of the query
to those of the patterns. Instead, GraPacc matches the patterns to the
code under editing via their features extracted from their Groums and
other associated context-sensitive information. Moreover, because a
query might be incomplete and/or not fully parsable, the corresponding
Groum might not contain all necessary information or might not even be
DEFINITION 4 (Feature). A graph-based feature
is a sequence of the textual labels of the nodes along a path of a
Groum. A token-based feature is a lexical/textual token extracted from
in Figure 3a, there are a graph-based feature of size 1
[Shell.new], a graph-based feature of size 2 [Shell.new, Shell.pack], a
graph-based feature of size 3 [Shell.pack, Shell.open,
The query "for (Iterator _" is incomplete and can
not be parsed into an AST. Thus the corresponding Groum is unavailable.
However, GraPacc extracts two tokens for and Iterator,
and those two tokens could be used to match this query to the patterns
that have the usages involving a for loop and an Iterator
DEFINITION 5 (Similarity Measure). Function
sim measures the similarity of any two given features f and g based on
their textual similarity and the orders of their elements.
DEFINITION 6 (Context-sensitive Weights). Context-sensitive
weights measure the signiï¬cance of the features in a query based on
their relations to the focus editing position (user-based factor) and
based on the structure of the query's graph-based model
DEFINITION 7 (Relevance Measurement). Function
fit measures the relevance of a pattern P to a query Q based on the
similarity of their features, the context-sensitive weights, and the
popularity of pattern P.
DEFINITION 8 (Pattern-based Code Completion). Given
a query Q, as an incomplete fragment of code, a code completion tool
returns a list of patterns Pi, which are ranked based on
their relevance degrees toward Q, i.e. their fit(Pi;Q)
values. If a pattern P* is selected, the tool will complete Q with the
code corresponding to P* and with the proper replacements of program
elements in the current code.