Thesis Defense

 

"Towards Improving the Effectiveness of Automated Program Repair"

Qi Xin

Tuesday, April 24, 2018, at 1:00 P.M.

Room 269 (CIT 2nd Floor)

Debugging is laborious. Automated program repair (APR) aims to save people time and efforts by repairing a faulty program in an automated way. A typical APR technique accepts as input a faulty program and a fault-exposing test suite. The goal is to generate as output a correct, patched program that passes the test suite. A significant branch of current APR techniques are search-based: they define a set of modification rules to create a space of patches first and then search in the space for a correct patch. Current search-based APR techniques face two main problems: (1) the search space problem and (2) the patch overfitting problem. For (1), they define a huge search space of patches to support repairing a large set of bugs, but they are not effective in finding a correct patch within such a huge space. For (2), they are prone to producing a patch that is test-suite-overfitted, or overfitting. An overfitting, patched program can pass the test suite but does not actually or fully repair the bug.

To address the two problems, we developed our APR techniques ssFix and sharpFix and our patch testing technique DiffTGen. ssFix finds and reuses existing code fragments (from a code database) that are syntactically related to the context of a fault to produce patches for its repair. By leveraging such syntax-related code, ssFix potentially creates a search space of patches whose size is reduced but is still likely to contain a correct patch. We evaluated ssFix on 357 bugs in the Defects4J bug dataset. Our results demonstrate the effectiveness of ssFix: it successfully repaired 20 bugs with valid patches generated. Compared to five other repair techniques: jGenProg, jKali, Nopol, HDRepair, and ACS, it has a better repair performance.

ssFix was built upon the assumption that existing code fragments from a code database often con- tain the fix ingredients, i.e., the statements/expressions needed for producing a correct patch. We conducted an experiment to evaluate how often the assumption holds in practice. We looked at 103 simple bugs in the Defects4J dataset for experiment, and found that the exact fix ingredients exist in the code database we used (which contains 66,341 Java projects) for 50 bugs and the parameterized fix ingredients exist for 80 bugs. We also conducted experiments to evaluate ssFix’s code search and code reuse abilities. We found that ssFix was able to retrieve code fragments that contain the fix ingredients for up to 69.1% simple bugs and was able to reuse 56.1% such code fragments to produce the correct patches. Based on our experimental observation, we developed sharpFix as an improved version of ssFix. Compared to ssFix, sharpFix uses different code search and code reuse methods. We evaluated sharpFix and found that compared to ssFix, it has better code search, code reuse, and repair abilities. For the 357 Defects4J bugs, sharpFix repaired in total 36 bugs with correct patches generated. To our knowledge, among all the existing APR techniques that were evaluated on the Defects4J dataset, sharpFix repaired the largest number of bugs.

A repair technique that uses a test suite as the correctness criterion can produce overfitting patches. Our patch testing technique DiffTGen can identify an overfitting patch through test case genera- tion. To do so, it first employs a test generator to generate new test inputs that uncover semantic differences between the patched program and the original, faulty program. It next shows the differ- ential semantics as different outputs to an oracle for correctness judging. DiffTGen determines the patched program to be overfitting if the output of the patched program is judged as incorrect, and can produce a test case based on a correct output provided by the oracle. We evaluated DiffTGen on 89 patches generated by four automated repair techniques for Java with 79 of the patches being likely to be overfitting. DiffTGen identifies in total 39 (49.4%) overfitting patches and yields the corresponding test cases.

Host: Professor Steve Reiss