Experimental Studies and Evaluation

We have conducted several experimental studies to evaluate the performance/usefulness of our approach. In all experiments, we used WindowsXP, Intel Pentium 4 3Ghz, 1GB RAM.

1 Precision and Recall

The goal of this experiment is to compare the performance of iLSI method with that of traditional LSI in term of how precise and complete the iLSI method recovers the links.

The ``good'' performance of TLR is defined with two common IR measures for TLR: precision and recall. The precision value is defined as the number of correctly retrieved documents over the total number of retrieved documents. The recall value is defined as the number of correctly retrieved documents over the total number of relevant documents. In addition to the threshold approach, we used the top rank approach (also called cut point), which imposes a threshold for the number of recovered links, without considering the actual values of similarity measures.

Table 1. Setting up for LEDA experiment
LEDA 3.4 LEDA 3.4.1
Source files 497 517
Manual sections 115 117
Number of terms 1317 1360
LSI matrix dimensions 1317 x 612 1360 x 634

We used all source code and manual pages of two versions 3.4 and 3.4.1 of LEDA (Library of Efficient Data types and Algorithms). Two versions of LEDA are needed because we want to evaluate iLSI method in an evolving software. We use each source code file and manual section as an individual document in the Term-Document matrix.

Table 2. Comparison between LSI and iLSI: Precision and Recall using Cosine Threshold
Correct-links retrieved Incorrect-links retrieved Missed Links Total-links retrieved Precision Recall
0.6 73 69 45 47 39 43 118 116 61.8 59.4 65.1 61.6
0.65 65 60 20 23 47 52 85 83 76.5 72.3 58.0, 53.6
0.7 50 48 9 12 62 64 59 60 84.7 80.0 44.6, 42.8

Firstly, we run TLEM using the traditional LSI in batch mode for the LEDA system at version 3.4. LSI's matrices for this version are computed during the process and later used in TLR for the system at version 3.4.1. For the version 3.4.1 of LEDA, we performed two executions: one with traditional LSI and one with incremental LSI. At both times, we measured the precision and recall values using both threshold and top rank approaches. Table 2 summarizes the precision and recall results for LEDA 3.4.1 using thresholds for both methods (LSI and iLSI). The first value in each pair corresponds to the traditional LSI method, and the second to the iLSI. We used the thresholds from 0.6-0.7, which has been suggested to provide better retrieval results for texts in previous experiments. K was chosen at 10. We found 112 correct links which we used in computing precision and recall. Table 2 shows that with those selected thresholds (0.6-0.7) suggested by earlier research, iLSI has a slightly decrease in term of precision and recall. Precision values drop about 3-4, recall values drop about 4-5.

Table 3. Precision/Recall with top rank
for iLSI
624 Cut Correct Incorrect Missed Total Precision Recall
1 69 21 43 90 76.67 61.60
2 94 85 18 179 52.10 83.92
3 97 174 15 271 35.79 86.61
4 99 260 13 359 27.57 88.39
5 100 348 12 448 22.32 89.29
6 102 438 10 540 18.88 91.07
7 103 524 9 627 16.42 91.96
8 104 616 8 720 14.44 92.86
9 107 702 5 809 13.22 95.54
10 110 791 2 901 12.2 98.21
11 111 879 1 990 11.21 99.1
12 112 968 0 1080 10.37 100

We repeated our experiment with the top rank approach (Table 3). The table is similar to Table 2 except that the first column is the cut points, instead of thresholds. Figure 1 shows the comparison between LSI and iLSI using top rank approach. The results analyzed with top rank approach agree with the results analyzed with the threshold approach. The precision and recall values are slightly decrease but not as much as the ones with thresholds. Precision values drop about 1-3, while recall values drop about 4. Thus, the results are a bit better with the top rank approach. In brief, the results showed that iLSI maintains almost as good performance as the traditional LSI method. Also, since the decrease of precision and recall values might be accumulated, TLEM re-run LSI after ten times of iLSI execution.

Figure 1. Comparison: LSI and iLSI with top rank

2 Experiment on Time Cost

In this experiment, we evaluate time efficiency of iLSI in comparison with that of LSI. To observe the correlation between the reduction percentage K and the time iLSI could save, we run TLEM on several systems with different Ks.

Figure 2 shows the results. For each software system, we have two versions: v1 and v2. Remind that the time cost for TLR on v2 using incremental LSI method depends on k because matrix P's size is k x k. Thus, it depends on K. In contrary, the time to run traditional LSI method on v2 of a software system for TLR does not depend on k or K because traditional LSI would require the re-run of entire SVD computation for the large Term-Document matrix. With traditional LSI, k is chosen during the reduction step, i.e., after the SVD computation. For each chosen K, we run TLEM on v1 with the traditional LSI configuration. Then, we run with traditional LSI on v2. We recorded the time spent on v2 to produce the links (tLSI_time in Figure 2). Then, we run TLEM with the incremental LSI configuration on v2 using the LSI results from the previous run on v1 with each K value. We recorded the time for each execution (iLSI time) and compared it with the time that TLEM run with traditional LSI configuration on v2.

As seen in Figure 2 (Save), the incremental LSI method saves a large amount of time. With K= 10% and in large systems such as JDK or Apache, iLSI can reduce the time cost by 1500-1600 times. Importantly, link recovery time cost now is at the range of seconds to ten seconds, which is totally suitable for interactive use for traceability link evolution management during development. In Figure 2, more time can be saved for larger systems because the key point of iLSI is to reduce the SVD computational cost on large LSI matrices. Furthermore, when K increases, time cost in using iLSI method increases as well because the LSI matrix is larger. However, with K from 10-40, the TLR time is still small enough for interactive use. In brief, these results are consistent with our complexity analysis.

Figure 2. Time Cost with iLSI, threshold=0.7

3 Tradeoff between Precision, Recall, and Time Cost

As seen in Figure 2, the time saving ratio depends on LSI's reduction percentage K of LSI matrices. However, precision and recall might be affected as K varies. This experiment aims to explore this correlation. We chose Jigsaw-2.6.5 and 2.6.6. We varied K, executed TLEM using incremental LSI, and recorded precision and recall values, and time costs.

In Figure 3, with a larger K, the incremental LSI process took longer time. However, it produced a better precision value and a lower recall value. This is because the approximate matrix is closer to the original one in term of least squares. With K= 100%, the tool had to run SVD decomposition for a matrix of the size 2240 x 2409. With K=10%, i.e. with a matrix of 224 x 241, it took only 12.4s.

Figure 3. Tradeoff between Precision/Recall and Time Cost

4 Experiment on Space Complexity

When a developer checks their software artifacts into the repository, in addition to changes to those artifacts, TLEM also stores traceability links among them as well as LSI matrices (i.e. T01, S01, and D01) of the latest LSI computation for later use. Thus, we concerned with two sources of storage overhead in our approach: 1) the storage cost of LSI matrices, and 2) that of the traceability links themselves.

For each system in Table 4, we picked one version, filtered out irrelevant files, checked them into Molhado repository, and measured the total space required for that system. Then, we used TLEM to recovery the traceability links in that system. After that, we checked into the repository all artifacts, traceability links, and LSI matrices. We measured the total space again and calculated the overhead space.

Table 4. Space Complexity
System Storage cost for links Storage cost for matrices Total source + documentation
LEDA 3.4.1 4.8 kB 65 kB 6.8 MB
Jigsaw-2.6.6 18.0 kB 204 kB 49 MB
SC_01_07 23.5 kB 220 kB 51 MB
Apache-2.2.4 42.1 kB 311 kB 72 MB

Table 4 shows that the storage cost for traceability links themselves is very small, compared to the total cost for source code and documentation. More importantly, although the sizes of LSI's matrices are large, the storage cost for them is reasonably small. The main reason is that the LSI matrices are sparse, and S01 is even a diagonal matrix. T01 and D01 are the matrices of eigen vectors of the square symmetric matrices. Therefore, storing eigen vectors and values, and X1 is sufficient. Also, X1 is a sparse matrix because each document contains only a small number of distinctive terms. Furthermore, if an SCM system has good compression scheme, space saving ratio is even higher.