Build Code Analysis with Symbolic Evaluation
Ahmed Tamrawi, Hoan Anh Nguyen, Hung Viet Nguyen, Tien N. Nguyen

Approach Overview

Symbolic Dependency Graph (SDG) and V-model
To address the challenges in the maintenance of Makefiles, SYMake provides a symbolic evaluation algorithm that processes Makefiles and produces a single symbolic dependency graph (SDG) to represent all possible build rules and dependencies among files via commands. It differs from a concrete dependency graph in make is that file names and commands in an SDG might not be completely resolved into strings. Instead, the SDG’s node for a file refers to a data structure, called V-model, i.e. a graph-based representation for symbolic string values for the file’s name. A V-model often contains symbols to represent the inputs or data retrieved from user environment. SDG enables static analysis on Makefiles and supports program understanding.

Figure I, shows a part of the SDG graph along with the corresponding V-models of myMakefile [download myMakefile].

           Figure I: Symbolic Dependency Graph and V-models
            myMakefile source code  

Evaluation Trace Model (T-model)
During the symbolic evaluation, for each resulting string value that represents a part of a file name or a command in a rule in SDG, SYMake provides also an acyclic graph (called T-model) to represent its symbolic evaluation trace. That is, that T-model shows how that string value is initialized, manipulated and computed via various Makefile’s program entities.

Figure II, shows the T-models for the string "server.o" from figure I, and the symbolic node "SYM01" of figure I.
                                    Figure II: T-models for server.o (left), SYM01 (right)

Detection of Code Smells and Errors
We have used SYMake to develop algorithms and a tool to detect several types of code smells and errors in Makefiles:

  • Cyclic Dependency

  • Loop of Revursive Variables

  • Duplicate Prerequisites

  • Rule Inclusion

Renaming Support with SYMake
Another application is automatic renaming, where SYMake needs to find all code locations where that variable was initialized/referenced. The key challenge is that a variable name in make can be dynamically evaluated and composed from the value(s) of other variable(s). Our empirical evaluation shows that SYMake renaming accuracy achieves 100% for the subjects systems we studied.