A Hybrid Approach for Source Code Migration

Anh Tuan Nguyen, Tien N. Nguyen, and Tung Thanh Nguyen

Introduction


In the modern computing world, software vendors often develop a software product for multiple operating platforms and mobile devices in different languages. For example, the same mobile app could be developed for iOS (in Objective-C), Android (in Java), and Windows Phone (in C#). Thus, there is an increasing need for migration/translation of source code from one programming language to another. Software could be originally developed in one language and then migrated to another language for a different platform.

In this work, we introduce mppSMT, a hybrid model that takes the best of both SMT and rule-based approaches. In our implementation, we used Java2CSharp, a rule-based migration tool, and the SMT-based Phrasal. Both of them are mature and open-source.

The key ideas of mppSMT include:

1. Using syntax-directed translation in which human-defined rules for syntactic units in Java2CSharp are used to guide the migration process of the coarse-grained program structures.

2. Using SMT within a coarse-grained program structure due to its ability to learn the mappings between the concrete names of program elements and APIs, and the mappings between the syntactic styles that are used in two languages.

3. Merging the respective translated code for program structures to produce the final result.

4. Enhancing SMT by associating code with semantic annotations, including the data types and semantic roles of code tokens.

5. Enhancing SMT by learning grammar rules, using syntactic annotations.

6. Adapting Phrasal to learn the new/updated API mappings whenever the new pairs of versions in two languages are available.

Our empirical evaluation shows that mppSMT improves accuracy over two individual SMT and rule-based approaches. The resulting code is seen by our human subjects as useful in code migration.