Duplicate Bug Report Detection with A Combination of
Information Retrieval and Topic Modeling

Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N. Nguyen, David Lo, and Chengnian Sun


Data and Tool


tool.zip contains following files:

  • BugDup.jar is an excutable file which parses an input text data file and give out the results via dump.txt.
  • config.dbtm is the configuration file for parsing and duplicate BRs detection process.
  • run.bat to run the BugDup.jar
  • .


                           java -Xmx2000m -jar BugDup.jar -c config_file input_data_file.txt  


    We used the data given by the authors of "Towards More Accurate Retrieval of Duplicate Bug Reports".


    The structure of a input_data_file.txt file is as following:

    [BugReportDescription 1]

    [BugReportDescription 2]


    [BugReportDescription n]


    Each [BugReportDescription i] has following structure (each fiedd forms a line).

    ID=[Number] // ID number of the bug report

    PS=[Text] // summary

    PD=[Text] //Description

    DID=[Number] // ID number of one duplcate bug report of i (if any), or leave it empty

    COMP=[Text] //Name of Component of the system relevant to the bug report

    SUB_COMP=[Text] //Name of  Component of the system relevant to the bug report

    VER=[Text] // Version

    PRIO=[Number] // Priority

    ISSUE_TYPE=[Text] // Issue type

    The normal texts (e.g ID=) are the required field names and should appear in the Description. The Italic texts is the to-be-filled in: [] is the field's type, // Is notation about field.