Event-Flow Graphs for Efficient Path-Sensitive Analyses
Ahmed Tamrawi, Suresh Kothari

Motivation Example

Figure 1 shows the control flow graph (CFG) of function hwrng_attr_current_store from the Linux kernel (v3.12). The nodes T and ⊥ respectively denote the unique entry and exit nodes added to the CFG. This example concerns event traces that are needed to check the safe-synchronization property that requires that a lock of a synchronization object must be followed by an unlock of the same locked object over all feasible CFG paths. The example contains two event nodes highlighted in gray: mutex_lock_interruptible (rng_mutex) (e1) and mutex_unlock (rng_mutex) (e2).

Figure 1: CFG for Function hwrng_attr_current_store
Figure 1: CFG for Function hwrng_attr_current_store

In this example, the execution paths in the CFG are depicted as a tree in Figure 2(a, b) using: 1) the CFG’s branch nodes {c1, c2, c3, c4, c5}, and 2) the event nodes labeled e1 and e2.

Figure 2: Trees representing paths in a CFG
Figure 2: Tress representing paths in a CFG

In Figure 2, T and F respectively denote the true and false branches from a branch node. The double line in figure 2(b) represents the loop between c2 and c3. To handle the loop with a break, we distinguish the paths by considering three possibilities: 1) At the first iteration of the loop, the loop gets broken (i.e., a break statement is executed) (Figure 2(a)), 2) the loop is never entered because the loop’s condition is violated before entering the loop (Figure 2(a)), or 3) the loop is executed one or more times (Figure 2(b)), in which case the paths include the double line (E) denoting loop execution. There are thus four paths, A, B, C, and D in which the loop is broken at the first iteration, and one path F corresponding to the case in which the loop is not entered. The case in which the loop is executed at least once before a possible break results in four paths: three paths due to a break in the loop, labeled E-B, E-C, and E-D, and path E-F representing normal exit from the loop without a break. In all, the CFG in Figure 1 has the nine possible types of paths illustrated in Figure 2(a, b).

In the CFG, the branch nodes c2, c3, c4, and c5 are considered irrelevant branch nodes; a branch node is irrelevant if all paths branching from it lead to the same event or terminal node. The branch node c1 is a relevant branch node. For a given path P in a CFG, the sub-trace of the execution trace of P consisting of only the relevant branch and event nodes, is called the event trace of P. The paths B, C, D, F, E-B, E-C, E-D and E-F have the same event trace (Te1c1e2⊥). Paths with identical event traces are considered equivalent and grouped into one equivalence class. The event trace of path A is Te1c1⊥, and it is by itself in another equivalence class.

Each equivalence class is represented by a unique event trace of the paths in that class. Once those equivalence classes are computed, it is sufficient to analyze only the event traces for all equivalence classes to check the safe-synchronization property. For example, the CFG in Figure 1 results in two events traces: 1) the trace (Te1c1e2⊥) that represents the equivalence class of paths: B, C, D, F, E-B, E-C, E-D and E-F, and 2) the trace Te1c1⊥ that represents the equivalence class of path A. It is then sufficient to analyze only these two traces to cover all the paths in the CFG.

To summarize the important points:
1- All CFG paths with the same event trace are grouped into one equivalence class. Nine paths are grouped into two equivalence classes in this example.
2- Analyzing event traces is equivalent to analyzing all CFG paths.
3- A branch node is irrelevant if all the paths branching from it either lead to the same event node or lead to the terminal node. Four out five branch nodes are irrelevant in this example.