Build Code Analysis with Symbolic Evaluation
Ahmed Tamrawi, Hoan Anh Nguyen, Hung Viet Nguyen, Tien N. Nguyen

Empirical Evaluation

Accuracy and Efficiency Evaluation in Renaming

We selected the Makefiles in 7 subject systems from their open-source repositories as shown in TableI.

                                            Table I: Subject Systems and Build Code Information

We asked six Ph.D. students to independently identify the locsets for consistent renaming. We built a tool to assist them in that oracle building task. For each string s, it performs text-searching for all occurrences of s in a Makefile and included one(s). Human subjects examine those occurrences and mark those which are variables’ names and require consistent renaming as belonging to locsets.

The entire process took more than 40 hours. In Table I, columns Locsets and Loc. display the number of locsets and the total number of locations in locsets. A locset can have multiple variables.

Complexity of Makefiles is expressed via the numbers of program elements. LOCs(Line of Codes of Makefiles), Files (Makefiles), Vars (Variables), Rules (Number of Rules), Paths (Number of branches with the Makefile), Includes (Number of included Makefiles), Define (#variables defined using define) , Foreach (# foreach statements), Eval (#eval statements), Call (#call statements), |SDG| (The total number of nodes in the corresponding SDGs), Time (time needed to build SDGs), and Mem (for the size of memory in MB for those SDGs).

In Table II, Precision (column Prec) is the ratio of the correctly detected locsets over the total detected ones. Recall (column Rec) is the ratio of the correctly detected ones over the total number of locsets in the oracle. Column Inc-Text shows the number of incorrect locations from the text-search tool. Columns Frag and NonFrag show the numbers of locsets with fragmented and nonfragmented variables.

                           
                                                     Table II: Locset Detection Accuracy Result

Usefulness in Smell Detection and Renaming

Experiment Setting:

We invited 8 Ph.D. students at Iowa State University with 4-8 years of programming experience who were divided into 2 groups. We also provided a short training session for all subjects. We applied the crossover technique. That is, Group 1 performed tasks 1, 3, and 5 using SYMake and other tasks without SYMake. The opposite is for Group 2.

To view the tasks assigned, please visit [Controlled Experiement Tasks Section]

Controlled Experiment Results:

Tool’s usefulness is measured via code quality and developer’s effort. For quality, in the detection tasks, we compared the numbers of smells correctly/incorrectly detected and missed without using tool or using it (two numbers are side-by-side, Table III). Precision and recall for location detection is in Prec-Loc and Recall-Loc. In the renaming tasks, we calculated the precision (Prec) and recall (Rec) of renaming locations. Developer effort is measured via finishing time. If the time limit was passed, (s)he was required to stop.
         
                                                     Table III: Controlled Experiment's Result
- For a detailed result for each of the subject, you can download their individual results. [All-Results.zip 95KB]
- For a detailed aggregated results from all subjects, you can download those results. [ControlledExperiment-Results.xlsx 22KB]