Problem Statement

Extensive effort in the software engineering community has been brought forth to improve the explicit semantic connection among software artifacts, especially between documentation and source code. Those semantic connections are often called traceability links. Recovering traceability links in legacy systems is particularly important and helpful for a variety of software engineering tasks. These tasks include program understanding, requirement tracing and validation, analysis for the integrity and completeness of the implementation, monitoring and controlling the impact of changes, reverse engineering for reuse and re-development, and many other software maintenance tasks. For these reasons, much effort has been spent on building automatic tools for traceability link recovery (TLR) from software systems. TLR technology to scan the software artifacts including documentation and source code, and then to identify the semantic connections is very advanced and successful.

Unfortunately, many of these methods are inherently unsuited to being applied to the evolving software systems.Simply recovering the traceability links is not su±cient due to the evolutionary nature of software. The links are probably only valid at a certain state or certain version of the software. A link represents the semantic relation between software documents. However, as software evolves during the development and maintenance process, software artifacts change as well. Therefore, changes to software artifacts might potentially invalidate the traceability links. At the same time, some other traceability links might not be affected by the changes. In other words, traceability links also evolve and need to be update-to-date during software development. To cope with software evolution, a naive approach for the automatic update of traceability links is to re-run a TLR tool, which is a computationally costly solution for interactive use during software development. Furthermore, after re-running of a TLR tool, changes to the links themselves can not be automatically captured. For example, users must manually compare the results from a TLR tool before and after the changes to figure out which links are added, deleted, and so on. In addition, no existing traceability link management approaches can effectively and automatically deal with software changes. They require either human intervention or users' feedbacks.In general, this represents a fundamental issue with existing traceability management approaches in maintaining traceability links. That is, the traceability link evolution is not automatically managed during the evolutionary process of software development. Although traceability link recovery tools are automated, they cannot
automatically evolve the links in an evolving software system. Therefore, strategies must be sought to effectively deal with automatic traceability link evolution management.

During software development, software is in constant evolution. Software artifacts including source code and documentation are modified. Traceability links are needed not only to be recovered but also to be automatically managed as the software evolves. This is not a small task for developers without the automatic tools that support link evolution management and link update. Specifically, without dedicated and automatic support, developers who wish to maintain and evolve traceability links for their software maintenance tasks must face the following challenges:

  • Current TLR technology produces traceability links, which are easily invalidated as soon as changes occur to documentation and source code. That is, TLR does not adapt well with software evolution.
  • Changes to software artifacts can be managed with a SCM system but traceability links are not automatically managed and updated when software changes. Therefore, developers must use ad hoc methods to maintain links or manually update them.
  • When changes occur, invalidated links which no longer connect the right documentation or source code cannot be updated without the re-run of a TLR process, which is too computationally costly to be interactively used in software development as we have seen in our motivating examples. Even with the results from a re-run of TLR tool, developers must manually compare them to the previous set of links for link update.
  • Information about traceability links cannot be reused in future tasks unless a TLR tool is re-applied.