Duplicate Bug Report Detection with A Combination of
Information Retrieval and Topic Modeling

Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N. Nguyen, David Lo, and Chengnian Sun

 

Data and Tool

 

tool.zip contains following files:

  • BugDup.jar is an excutable file which parses an input text data file and give out the results via dump.txt.
  • config.dbtm is the configuration file for parsing and duplicate BRs detection process.
  • run.bat to run the BugDup.jar
  • .

    Usage:

                           java -Xmx2000m -jar BugDup.jar -c config_file input_data_file.txt  

     

    We used the data given by the authors of "Towards More Accurate Retrieval of Duplicate Bug Reports".

     

    The structure of a input_data_file.txt file is as following:

    [BugReportDescription 1]

    [BugReportDescription 2]

    ...

    [BugReportDescription n]

     

    Each [BugReportDescription i] has following structure (each fiedd forms a line).

    ID=[Number] // ID number of the bug report

    PS=[Text] // summary

    PD=[Text] //Description

    DID=[Number] // ID number of one duplcate bug report of i (if any), or leave it empty

    COMP=[Text] //Name of Component of the system relevant to the bug report

    SUB_COMP=[Text] //Name of  Component of the system relevant to the bug report

    VER=[Text] // Version

    PRIO=[Number] // Priority

    ISSUE_TYPE=[Text] // Issue type

    The normal texts (e.g ID=) are the required field names and should appear in the Description. The Italic texts is the to-be-filled in: [] is the field's type, // Is notation about field.