Experimental Studies and Evaluation
We have conducted several experimental studies to evaluate the performance/usefulness
of our approach. In all experiments, we used WindowsXP, Intel Pentium
4 3Ghz, 1GB RAM.
The goal of this experiment is to compare the performance of iLSI
method with that of traditional LSI in term of how precise and complete
the iLSI method recovers the links.
The ``good'' performance of TLR is defined with two common IR measures
for TLR: precision and recall. The precision value is
defined as the number of correctly retrieved documents over the total
number of retrieved documents. The recall value is defined as the
number of correctly retrieved documents over the total number of relevant
documents. In addition to the threshold approach, we used the top
rank approach (also called cut point), which imposes a
threshold for the number of recovered links, without considering the
actual values of similarity measures.
Table 1. Setting up for LEDA experiment
| |
LEDA 3.4 |
LEDA 3.4.1 |
| Source files |
497 |
517 |
| Manual sections
|
115 |
117 |
| Number of terms
|
1317 |
1360 |
| LSI matrix dimensions
|
1317 x 612 |
1360 x 634 |
|
We used all source code and manual pages of two versions 3.4 and
3.4.1 of LEDA (Library of Efficient Data types and Algorithms). Two
versions of LEDA are needed because we want to evaluate iLSI method
in an evolving software. We use each source code file and manual
section as an individual document in the Term-Document matrix.
Table 2. Comparison between LSI and iLSI: Precision
and Recall using Cosine Threshold
| |
Correct-links retrieved |
Incorrect-links retrieved |
Missed Links |
Total-links retrieved |
Precision |
Recall |
| |
| Threshold
|
LSI |
iLSI |
LSI |
iLSI |
LSI |
iLSI |
LSI |
iLSI |
LSI |
iLSI |
LSI |
iLSI |
| 0.6 |
73 |
69 |
45 |
47 |
39 |
43 |
118 |
116 |
61.8 |
59.4 |
65.1
|
61.6 |
| 0.65 |
65 |
60 |
20 |
23 |
47 |
52 |
85 |
83 |
76.5
|
72.3 |
58.0,
|
53.6 |
| 0.7 |
50 |
48 |
9 |
12 |
62 |
64 |
59 |
60 |
84.7
|
80.0 |
44.6, |
42.8 |
|
Firstly, we run TLEM using the traditional LSI in batch mode for
the LEDA system at version 3.4. LSI's matrices for this version are
computed during the process and later used in TLR for the system at
version 3.4.1. For the version 3.4.1 of LEDA, we performed two executions:
one with traditional LSI and one with incremental LSI. At both times,
we measured the precision and recall values using both threshold and
top rank approaches. Table 2 summarizes
the precision and recall results for LEDA 3.4.1 using thresholds
for both methods (LSI and iLSI). The first value in each pair corresponds
to the traditional LSI method, and the second to the iLSI. We used
the thresholds from 0.6-0.7, which has been suggested to provide better
retrieval results for texts in previous experiments. K was
chosen at 10. We found 112 correct links which
we used in computing precision and recall. Table 2
shows that with those selected thresholds (0.6-0.7) suggested by earlier
research, iLSI has a slightly decrease in term of precision and recall.
Precision values drop about 3-4, recall values
drop about 4-5.
Table 3. Precision/Recall with top rank
for iLSI
| 624
Cut |
Correct |
Incorrect |
Missed |
Total |
Precision |
Recall |
| 1 |
69 |
21 |
43 |
90 |
76.67
|
61.60
|
| 2 |
94 |
85 |
18 |
179 |
52.10
|
83.92
|
| 3 |
97 |
174 |
15 |
271 |
35.79
|
86.61
|
| 4 |
99 |
260 |
13 |
359 |
27.57
|
88.39
|
| 5 |
100 |
348 |
12 |
448 |
22.32
|
89.29
|
| 6 |
102 |
438 |
10 |
540 |
18.88
|
91.07
|
| 7 |
103 |
524 |
9 |
627 |
16.42
|
91.96
|
| 8 |
104 |
616 |
8 |
720 |
14.44
|
92.86
|
| 9 |
107 |
702 |
5 |
809 |
13.22
|
95.54
|
| 10 |
110 |
791 |
2 |
901 |
12.2
|
98.21
|
| 11 |
111 |
879 |
1 |
990 |
11.21
|
99.1
|
| 12 |
112 |
968 |
0 |
1080 |
10.37
|
100
|
|
We repeated our experiment with the top rank approach (Table 3).
The table is similar to Table 2 except that the first column is the
cut points, instead of thresholds. Figure 1 shows the comparison between
LSI and iLSI using top rank approach. The results analyzed with top
rank approach agree with the results analyzed with the threshold approach.
The precision and recall values are slightly decrease but not as much
as the ones with thresholds. Precision values drop about 1-3,
while recall values drop about 4. Thus, the
results are a bit better with the top rank approach. In brief, the
results showed that iLSI maintains almost as good performance as the
traditional LSI method. Also, since the decrease of precision and
recall values might be accumulated, TLEM re-run LSI after ten times
of iLSI execution.
Figure 1. Comparison: LSI and iLSI with top rank
In this experiment, we evaluate time efficiency of iLSI in comparison
with that of LSI. To observe the correlation between the reduction
percentage K and the time iLSI could save, we run TLEM on several
systems with different Ks.
Figure 2 shows the results. For each software system, we have two
versions: v1 and v2. Remind that the time cost for TLR on v2 using
incremental LSI method depends on k because matrix P's
size is k x k. Thus, it depends on K. In contrary,
the time to run traditional LSI method on v2 of a software system
for TLR does not depend on k or K because traditional
LSI would require the re-run of entire SVD computation for the large
Term-Document matrix. With traditional LSI, k is chosen during
the reduction step, i.e., after the SVD computation. For each chosen
K, we run TLEM on v1 with the traditional LSI configuration.
Then, we run with traditional LSI on v2. We recorded the time spent
on v2 to produce the links (tLSI_time in Figure 2).
Then, we run TLEM with the incremental LSI configuration on v2 using
the LSI results from the previous run on v1 with each K value.
We recorded the time for each execution (iLSI time) and compared
it with the time that TLEM run with traditional LSI configuration
on v2.
As seen in Figure 2 (Save), the incremental LSI method
saves a large amount of time. With K= 10% and
in large systems such as JDK or Apache, iLSI can reduce the time cost
by 1500-1600 times. Importantly, link recovery time cost now is at
the range of seconds to ten seconds, which is totally suitable for
interactive use for traceability link evolution management during
development. In Figure 2, more time
can be saved for larger systems because the key point of iLSI is to
reduce the SVD computational cost on large LSI matrices. Furthermore,
when K increases, time cost in using iLSI method increases
as well because the LSI matrix is larger. However, with K from
10-40, the TLR time is still small enough for
interactive use. In brief, these results are consistent with our complexity
analysis.
Figure 2. Time Cost with iLSI, threshold=0.7
As seen in Figure 2, the time saving
ratio depends on LSI's reduction percentage K of LSI matrices.
However, precision and recall might be affected as K varies.
This experiment aims to explore this correlation. We chose Jigsaw-2.6.5
and 2.6.6. We varied K, executed TLEM using incremental LSI,
and recorded precision and recall values, and time costs.
In Figure 3, with a larger K,
the incremental LSI process took longer time. However, it produced
a better precision value and a lower recall value. This is because
the approximate matrix is closer to the original one in term of least
squares. With K= 100%, the tool had to run SVD
decomposition for a matrix of the size 2240
x 2409. With K=10%, i.e. with a matrix of 224
x 241, it took only 12.4s.
Figure 3. Tradeoff between Precision/Recall and
Time Cost
When a developer checks their software artifacts into the repository,
in addition to changes to those artifacts, TLEM also stores traceability
links among them as well as LSI matrices (i.e. T01,
S01, and D01) of the latest LSI
computation for later use. Thus, we concerned with two sources of
storage overhead in our approach: 1) the storage cost of LSI matrices,
and 2) that of the traceability links themselves.
For each system in Table 4, we picked
one version, filtered out irrelevant files, checked them into Molhado
repository, and measured the total space required for that system.
Then, we used TLEM to recovery the traceability links in that system.
After that, we checked into the repository all artifacts, traceability
links, and LSI matrices. We measured the total space again and calculated
the overhead space.
Table 4. Space Complexity
| System
|
Storage
cost for links |
Storage
cost for matrices |
Total source
+ documentation |
| LEDA 3.4.1
|
4.8 kB |
65 kB |
6.8 MB
|
| Jigsaw-2.6.6
|
18.0 kB
|
204 kB
|
49 MB |
| SC_01_07
|
23.5 kB
|
220 kB
|
51 MB |
| Apache-2.2.4
|
42.1 kB
|
311 kB
|
72 MB |
|
Table 4 shows that the storage cost
for traceability links themselves is very small, compared to the total
cost for source code and documentation. More importantly, although
the sizes of LSI's matrices are large, the storage cost for them is
reasonably small. The main reason is that the LSI matrices are sparse,
and S01 is even a diagonal matrix. T01
and D01 are the matrices of eigen vectors of the
square symmetric matrices. Therefore, storing eigen vectors and values,
and X1 is sufficient. Also, X1
is a sparse matrix because each document contains only a small number
of distinctive terms. Furthermore, if an SCM system has good
compression scheme, space saving ratio is even higher.