Duplicate Bug Report Detection with A Combination of
Information Retrieval and Topic Modeling

Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N. Nguyen, David Lo, and Chengnian Sun

 

DBTM's Approach

Topic Model for Duplicate Bug Reports

Topic Model for Duplicate Bug Reports
Figure 1.Topic Model for Duplicate Bug Reports

 

To support the detection of duplicate bug reports, we specifically develop a novel topic model, called T-Model, based on the mechanism of topic modeling in LDA. Figure 1 shows the graphical notation of T-Model. Our idea is as follows. Each bug report bi is modeled by a LDA, which is represented via three parameters: topic proportion θbi, topic assignment zbi, and the selected terms wbi. While θbi and zbi are latent, the terms wbi are observable and determined by the topic assignment z and word selection ϕ.