Experimental Studies and Evaluation
We have conducted several experimental studies to evaluate the performance/usefulness
of our approach. In all experiments, we used WindowsXP, Intel Pentium
4 3Ghz, 1GB RAM.
The goal of this experiment is to compare the performance of iLSI
method with that of traditional LSI in term of how precise and complete
the iLSI method recovers the links.
The ``good'' performance of TLR is defined with two common IR measures
for TLR: precision and recall. The precision value is
defined as the number of correctly retrieved documents over the total
number of retrieved documents. The recall value is defined as the
number of correctly retrieved documents over the total number of relevant
documents. In addition to the threshold approach, we used the top
rank approach (also called cut point), which imposes a
threshold for the number of recovered links, without considering the
actual values of similarity measures.
Table 1. Setting up for LEDA experiment

LEDA 3.4 
LEDA 3.4.1 
Source files 
497 
517 
Manual sections

115 
117 
Number of terms

1317 
1360 
LSI matrix dimensions

1317 x 612 
1360 x 634 

We used all source code and manual pages of two versions 3.4 and
3.4.1 of LEDA (Library of Efficient Data types and Algorithms). Two
versions of LEDA are needed because we want to evaluate iLSI method
in an evolving software. We use each source code file and manual
section as an individual document in the TermDocument matrix.
Table 2. Comparison between LSI and iLSI: Precision
and Recall using Cosine Threshold

Correctlinks retrieved 
Incorrectlinks retrieved 
Missed Links 
Totallinks retrieved 
Precision 
Recall 

Threshold

LSI 
iLSI 
LSI 
iLSI 
LSI 
iLSI 
LSI 
iLSI 
LSI 
iLSI 
LSI 
iLSI 
0.6 
73 
69 
45 
47 
39 
43 
118 
116 
61.8 
59.4 
65.1

61.6 
0.65 
65 
60 
20 
23 
47 
52 
85 
83 
76.5

72.3 
58.0,

53.6 
0.7 
50 
48 
9 
12 
62 
64 
59 
60 
84.7

80.0 
44.6, 
42.8 

Firstly, we run TLEM using the traditional LSI in batch mode for
the LEDA system at version 3.4. LSI's matrices for this version are
computed during the process and later used in TLR for the system at
version 3.4.1. For the version 3.4.1 of LEDA, we performed two executions:
one with traditional LSI and one with incremental LSI. At both times,
we measured the precision and recall values using both threshold and
top rank approaches. Table 2 summarizes
the precision and recall results for LEDA 3.4.1 using thresholds
for both methods (LSI and iLSI). The first value in each pair corresponds
to the traditional LSI method, and the second to the iLSI. We used
the thresholds from 0.60.7, which has been suggested to provide better
retrieval results for texts in previous experiments. K was
chosen at 10. We found 112 correct links which
we used in computing precision and recall. Table 2
shows that with those selected thresholds (0.60.7) suggested by earlier
research, iLSI has a slightly decrease in term of precision and recall.
Precision values drop about 34, recall values
drop about 45.
Table 3. Precision/Recall with top rank
for iLSI
624
Cut 
Correct 
Incorrect 
Missed 
Total 
Precision 
Recall 
1 
69 
21 
43 
90 
76.67

61.60

2 
94 
85 
18 
179 
52.10

83.92

3 
97 
174 
15 
271 
35.79

86.61

4 
99 
260 
13 
359 
27.57

88.39

5 
100 
348 
12 
448 
22.32

89.29

6 
102 
438 
10 
540 
18.88

91.07

7 
103 
524 
9 
627 
16.42

91.96

8 
104 
616 
8 
720 
14.44

92.86

9 
107 
702 
5 
809 
13.22

95.54

10 
110 
791 
2 
901 
12.2

98.21

11 
111 
879 
1 
990 
11.21

99.1

12 
112 
968 
0 
1080 
10.37

100


We repeated our experiment with the top rank approach (Table 3).
The table is similar to Table 2 except that the first column is the
cut points, instead of thresholds. Figure 1 shows the comparison between
LSI and iLSI using top rank approach. The results analyzed with top
rank approach agree with the results analyzed with the threshold approach.
The precision and recall values are slightly decrease but not as much
as the ones with thresholds. Precision values drop about 13,
while recall values drop about 4. Thus, the
results are a bit better with the top rank approach. In brief, the
results showed that iLSI maintains almost as good performance as the
traditional LSI method. Also, since the decrease of precision and
recall values might be accumulated, TLEM rerun LSI after ten times
of iLSI execution.
Figure 1. Comparison: LSI and iLSI with top rank
In this experiment, we evaluate time efficiency of iLSI in comparison
with that of LSI. To observe the correlation between the reduction
percentage K and the time iLSI could save, we run TLEM on several
systems with different Ks.
Figure 2 shows the results. For each software system, we have two
versions: v1 and v2. Remind that the time cost for TLR on v2 using
incremental LSI method depends on k because matrix P's
size is k x k. Thus, it depends on K. In contrary,
the time to run traditional LSI method on v2 of a software system
for TLR does not depend on k or K because traditional
LSI would require the rerun of entire SVD computation for the large
TermDocument matrix. With traditional LSI, k is chosen during
the reduction step, i.e., after the SVD computation. For each chosen
K, we run TLEM on v1 with the traditional LSI configuration.
Then, we run with traditional LSI on v2. We recorded the time spent
on v2 to produce the links (tLSI_time in Figure 2).
Then, we run TLEM with the incremental LSI configuration on v2 using
the LSI results from the previous run on v1 with each K value.
We recorded the time for each execution (iLSI time) and compared
it with the time that TLEM run with traditional LSI configuration
on v2.
As seen in Figure 2 (Save), the incremental LSI method
saves a large amount of time. With K= 10% and
in large systems such as JDK or Apache, iLSI can reduce the time cost
by 15001600 times. Importantly, link recovery time cost now is at
the range of seconds to ten seconds, which is totally suitable for
interactive use for traceability link evolution management during
development. In Figure 2, more time
can be saved for larger systems because the key point of iLSI is to
reduce the SVD computational cost on large LSI matrices. Furthermore,
when K increases, time cost in using iLSI method increases
as well because the LSI matrix is larger. However, with K from
1040, the TLR time is still small enough for
interactive use. In brief, these results are consistent with our complexity
analysis.
Figure 2. Time Cost with iLSI, threshold=0.7
As seen in Figure 2, the time saving
ratio depends on LSI's reduction percentage K of LSI matrices.
However, precision and recall might be affected as K varies.
This experiment aims to explore this correlation. We chose Jigsaw2.6.5
and 2.6.6. We varied K, executed TLEM using incremental LSI,
and recorded precision and recall values, and time costs.
In Figure 3, with a larger K,
the incremental LSI process took longer time. However, it produced
a better precision value and a lower recall value. This is because
the approximate matrix is closer to the original one in term of least
squares. With K= 100%, the tool had to run SVD
decomposition for a matrix of the size 2240
x 2409. With K=10%, i.e. with a matrix of 224
x 241, it took only 12.4s.
Figure 3. Tradeoff between Precision/Recall and
Time Cost
When a developer checks their software artifacts into the repository,
in addition to changes to those artifacts, TLEM also stores traceability
links among them as well as LSI matrices (i.e. T_{01},
S_{01}, and D_{01}) of the latest LSI
computation for later use. Thus, we concerned with two sources of
storage overhead in our approach: 1) the storage cost of LSI matrices,
and 2) that of the traceability links themselves.
For each system in Table 4, we picked
one version, filtered out irrelevant files, checked them into Molhado
repository, and measured the total space required for that system.
Then, we used TLEM to recovery the traceability links in that system.
After that, we checked into the repository all artifacts, traceability
links, and LSI matrices. We measured the total space again and calculated
the overhead space.
Table 4. Space Complexity
System

Storage
cost for links 
Storage
cost for matrices 
Total source
+ documentation 
LEDA 3.4.1

4.8 kB 
65 kB 
6.8 MB

Jigsaw2.6.6

18.0 kB

204 kB

49 MB 
SC_01_07

23.5 kB

220 kB

51 MB 
Apache2.2.4

42.1 kB

311 kB

72 MB 

Table 4 shows that the storage cost
for traceability links themselves is very small, compared to the total
cost for source code and documentation. More importantly, although
the sizes of LSI's matrices are large, the storage cost for them is
reasonably small. The main reason is that the LSI matrices are sparse,
and S_{01} is even a diagonal matrix. T_{01}
and D_{01} are the matrices of eigen vectors of the
square symmetric matrices. Therefore, storing eigen vectors and values,
and X_{1} is sufficient. Also, X_{1}
is a sparse matrix because each document contains only a small number
of distinctive terms. Furthermore, if an SCM system has good
compression scheme, space saving ratio is even higher.