Using Change Context with Statistical Learning for
API Code Recommendation

Anh Tuan Nguyen, Michael Hilton, Mihai Codoban, Hoan Anh Nguyen,
Lily Mast, Eli Rademacher, Tien N. Nguyen, and Danny Dig

Todays’ programs use Application Programming Interfaces (APIs) extensively: even the “Hello World” program invokes an API method. One of the main challenges in software development is learning and remembering how to use APIs. Even developers who are familiar with some APIs are forced to re-learn them due to continuous API evolution.

The state-of-the-practice support for working with APIs comes in the form of code-completion 1 tools integrated in the IDEs. Code completion tools allow a user to type a variable and request a possible API method call recommendation. Code completion tools are among the top-5 most used features of IDEs. Still, a developer learning an API (or trying to remember it) can be wasting a lot of time combing through a long list of API method names available on a receiver object. For example, invoking the code completion on an object of type String from JDK 8 populates a list of 67 possible methods (and 10 additional methods inherited from superclasses). Carefully examining a long list of API method names and reading the associated Javadoc documentation is tedious.

In this paper we present a novel approach to code completion centered around fine-grained code changes. The intuition behind our approach is that source code changes are repetitive and have a high degree of regularity, thus we can harvest them to significantly improve the accuracy of recommendations. For example, when changing the code to instantiate a fresh List object, a developer often changes the code to add a new element into the fresh List via List.add. However, the developer is unlikely to call List.remove just after instantiating a fresh List. Without being aware of the context of the change, a code completion tool would not be able to differentiate between recommending add or remove. In our approach we mine the fine-grained code changes as well as the context in which the recommender was invoked to build a statistical model for recommending the next API method call. Our approach can be thought as a language model for fine-grained source code changes.

We implemented our approach in a tool, APIREC, that computes the most likely API method call to be inserted at the requested location in a given part of the code. APIREC works in three stages: (i) it builds a corpus of fine-grained code changes from a training set, (ii) it statistically learns which fine-grained changes co-occur, and (iii) it computes and then recommends a new API call at a given location based on the current context and previous changes.
 
We developed an association-based inference model that learns from the corpus the changes that frequently co-occur in the same changed file within a commit. APIREC also learns the surrounding context (e.g., for loops, preceding method calls) of the code changes. In addition, APIREC learns the appropriate weights to assign to the surrounding context and the previous changes. Given previous changes, the context of the recommendation invocation, and the inference model, APIREC computes the likelihood of inserting an API method call. Finally, it ranks the candidate API calls based on computed likelihood.

Our empirical evaluation on realworld projects shows that APIREC has high accuracy in API code completion: top-1 average accuracy for in-vocabulary is 59.5% and top-5 accuracy is 75%. This is a significant improvement over the previous best-in-class approach: 2.4x for top-1 and 2.2x for top-5