material
|
file
|
description
|
Data source (OSS projects from GitHub)
|
- project_list.csv(73KB)
- hash_list.csv(45KB)
|
-
List of OSS projects:
The CSV file contains the list of 1000 projects. Each line provides the project's id, name, and the URL of git repository.
-
List of commit hash:
The CSV file presents the list of commit hashes corresponding to the above projects.
|
Variable data
|
- all_vars.zip(13.5MB)
- all_compound_vars.zip(10MB)
- var_similarity.zip(253MB)
|
- List of all variables:
The TSV (tab-separated values) file contains all variables' data collected from all projects.
Each line provides the project id, the file path, the variable kind ("L": local variable, "M": method's formal parameter, "F": field), the variable name, the variable type, the line number of beginning scope, the line number of ending scope, and the line count of scope range.
It is compressed as a ZIP file.
- List of all variables with compound names:
The TSV file contains all variables with "compound names."
The format is as the same as all_vars.txt but it has additional column "words."
The "words" column gives the result of name splitting.
- List of all variable paris:
The CSV file contains the variable pairs with the similarity scores.
The columns id1 and id2 correspond to the variables' ids presented in all_compound_vars.txt.
The columns levsim and cossim are the Levenshtein similarity and cosine similarity (document vectors' similarity), respectively.
It is compressed as a ZIP file.
|
Prepared dictionary
|
- dot_aspell.en.pws.txt(1KB)
- abbreviated_word_dictionary.txt(1KB)
|
- Aspell user dictionary; it should be renamed to ".aspell.en.pws" when you use.
- Abbreviated word dictionary
|
Java program to extarct local variables
|
JavaVariableScopeExtractor.jar(9.3MB)
|
see the tool site for the details.
|