Data files used in our paper

Paper title: A Survival Analysis-Based Prioritization of Code Checker Warning: A Case Study using PMD
Journal: Springer: Studies in Computational Intelligence
Status: published as
Roger Lee (ed.), Big Data, Cloud Computing, and Data Science Engineering. Studies in Computational Intelligence, vol. 844, pp. 69--83, Springer, Cham, Jan. 2020. [DOI 10.1007/978-3-030-24405-7_5]
data file description
List of investigated Java projects
(CSV format)
projects.csv
(8KB)
This CSV file presents 100 records including "Git repository URL", "project's name" and "clone date."
Lists of investigated Java source files source_file_lists.zip
(445KB)
This ZIP file contains 100 directories whose names correspond to project names;
each directory has a text file presenting source file paths.
Note: The following three projects' lists are empty since all of their source files looked any one of sample, example, test, documents or demo programs: Android-CleanArchitecture, gradle-retrolambda and java8-tutorial.
Lists of source file changes
(tab-separated values (TSV) format)
source_file_change_histories.zip
(28MB)
This ZIP file contains 100 directories whose names correspond to project names;
each directory has 1 TSV file: source_file_change_history.tsv.
In the TSV file, each line corresponds to each commit of each source file.
They have the following five columns:
  1. No.: Zero-based index of a file's changes (commits), and "0" signifies the initial commit of the corresponding file.
    This index is reset to zero by each source file.
  2. Total count of changes: The number of changes (commits) which the corresponding file was involved in.
    This value is also corresponding to the maximum No.
    For example, if a source file was changed three times, the following four lines are presented:
    0    3    ... (initial commit)
    1    3    ...
    2    3    ...
    3    3    ... (last commit)
  3. Commit hash
  4. File path: Since the file name or file path might be changed through a commit, this column is presented for each commit.
  5. Commit date
Warnings made by PMD
(tab-separated values (TSV) format)
pmd_results.zip
(718MB)
This ZIP file contains 100 directories whose names correspond to project names;
each directory has a TSV file which presents the warnings made by PMD for all versions of all source files, whose columns are: "the commit hash," "the file path," "the line number corresponding to the warning," and "the warning's priority."
Warnings' priorities are pre-defined in the PMD rule sets.
Warning lifetime data (by project)
(CSV format)
survival_data_by_project.zip
(16MB)
This ZIP file contains 100 directories whose names correspond to project names;
each directory has a CSV file presenting sets of "warning," "censor" and "lifetime (in days)."
In "censor" column, it gives 0 if the warning sample is a censor sample; otherwise, 1.
Warning lifetime data (by kind of warning)
(CSV format)
survival_data_by_warning.zip
(14MB)
This ZIP file contains 259 CSV files whose names correspond to warning names;
each CSV file presenting sets of "warning," "censor" and "lifetime (in days)."
In "censor" column, it gives 0 if the warning sample is a censor sample; otherwise, 1.
List of the expected lifetimes of warnings
(CSV format)
survival_analysis_results.csv
(11KB)
This CSV file lists 259 results of the survival analysis: each line corresponds to each kind of warning, which gives the warning name and the expected lifetime (in days).