Data used in our SEAA2024 paper

Paper title: Fault-Proneness of Python Programs Tested By Smelled Test Code
Conference: The 50th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2024)
Status: published as Yuki Fushihara, Hirohisa Aman, Sousuke Amasaki, Tomoyuki Yokogawa, and Minoru Kawahara, ``Fault-Proneness of Python Programs Tested By Smelled Test Code,'' Proc. 50th Euromicro Conference on Software Engineering and Advanced Applications, pp.373--378, Aug.2024.
data file description
project list projects.csv (3KB) list of studied projects' repository URLs;
Note: We originally checked 100 repositories and chose 50 out of them satisfying the following conditions:
  1. Their “stars” scores are higher than 50.
  2. They have modified not only production code but test code through commits.
  3. Their repositories have more than 1000 commits.
  4. They have more than ten contributors.
  5. Their developments have lasted more than two years.
  6. They have had one or more commits within the last twelve months.
table of production code and test smells mapping.csv (325KB) mapping.csv has 775 rows and 174 columns.
each row corresponds to each production code: its repository URL, file path, is_faulty (1: yes, 0:no), and test smells occur in its test code (1: occurs, 0: no).
estimated probability list estimated_prob.csv (2KB) list of estimated probabilities that the production code is faulty when it was tested by smelled test code;
its columns are
  • test_smell: test smell's ID.
  • num_of_faulty_production_code: the number of faulty production code that was tested by a test code having the corresponding test smell.
  • num_of_production_code: the number of production code that was tested by a test code having the corresponding test smell.
  • prob: the estimated probability that the production code is faulty when it was tested by a test code having the corresponding test smell.
  • lower: the 95% credible interval's lower bound.
  • upper: the 95% credible interval's upper bound.
estimated probability list estimated_prob_combi.csv (7KB) list of estimated probabilities that the production code is faulty when it was tested by smelled test code;
its columns are
  • rank: ranking of the estimated probalibity.
  • test_smell: test smells' IDs (co-occurred ones).
  • num_of_faulty_production_code: the number of faulty production code that was tested by a test code having the corresponding two test smells.
  • num_of_production_code: the number of production code that was tested by a test code having the corresponding two test smells.
  • prob: the estimated probability that the production code is faulty when it was tested by a test code having the corresponding two test smell.
  • lower: the 95% credible interval's lower bound.
  • upper: the 95% credible interval's upper bound.