Data and Supplementary Materials of Our Paper Submitted to AeSIR 2021

Paper title: An Investigation of Compound Variable Names Toward Automated Detection of Confusing Variable Pairs
Conference: The 1st Workshop on Automated Support to Improve code Readability (AeSIR2021)
Status: (published)
Proc. 36th IEEE/ACM International Conference on Automated Software Engineering Workshops, pp.133-–137, Nov. 2021
material file description
Data source (OSS projects from GitHub)
  1. project_list.csv(73KB)
  2. hash_list.csv(45KB)
  1. List of OSS projects:
    The CSV file contains the list of 1000 projects. Each line provides the project's id, name, and the URL of git repository.
  2. List of commit hash:
    The CSV file presents the list of commit hashes corresponding to the above projects.
Variable data
  1. all_vars.zip(13.5MB)
  2. all_compound_vars.zip(10MB)
  3. var_similarity.zip(253MB)
  1. List of all variables:
    The TSV (tab-separated values) file contains all variables' data collected from all projects. Each line provides the project id, the file path, the variable kind ("L": local variable, "M": method's formal parameter, "F": field), the variable name, the variable type, the line number of beginning scope, the line number of ending scope, and the line count of scope range. It is compressed as a ZIP file.
  2. List of all variables with compound names:
    The TSV file contains all variables with "compound names." The format is as the same as all_vars.txt but it has additional column "words." The "words" column gives the result of name splitting.
  3. List of all variable paris:
    The CSV file contains the variable pairs with the similarity scores. The columns id1 and id2 correspond to the variables' ids presented in all_compound_vars.txt. The columns levsim and cossim are the Levenshtein similarity and cosine similarity (document vectors' similarity), respectively. It is compressed as a ZIP file.
Prepared dictionary
  1. dot_aspell.en.pws.txt(1KB)
  2. abbreviated_word_dictionary.txt(1KB)
  1. Aspell user dictionary; it should be renamed to ".aspell.en.pws" when you use.
  2. Abbreviated word dictionary
Java program to extarct local variables JavaVariableScopeExtractor.jar(9.3MB) see the tool site for the details.