S6: Semantics-Based Code Search

Our work on code search is designed to let programmers take advantage of the large repositories of available open-source code. Traditional code search engines such as Google's codesearch, Koders, or Krugle provide access to such repositories but don't really simplify the programmers' job in using the code. They take keywords and returns potentially hundreds of candidate pieces of code. The programmer then has to go through each of these returned files. They then have to see if the code might be relevant. If it is, they have to read it in detail to determine if it is exactly what they want or at least close to it. Finally, they have to adapt the code to meet their particular requirements regarding naming, formatting, error handling, etc.

We feel that a better approach would be to have the programmer provide more precise information as to what they want and then have the system do the grunt work of checking the returned code fragments, of modifying the code to do what the programmer wants, and of transforming the code to fit into the target framework. Our search front end has the programmer define the semantics of what they want. This includes keywords as an informal description, a signature, test cases and contracts (via JML) for functional specifications, security constraints (using the Java security model), and threading constraints (not fully implemented). In addition, the user can provide a context into which the code will fit. The front end attempts to make these specifications easy to provide.

The system works by using the keywords to access one of the available code search engines (or a local code search engine for code available at Brown), to get candidate files. Each class or method in these files (depending on what the user is searching for) is considered a potential solution. These solutions are then transformed using a set of about 30 transformations in an attempt to map the code into exactly what the programmer specified. The transformations range from the simple (e.g. changing the name of the method to match the signature) to the complex (e.g. finding a line in the method that computes a value of the returned type and then doing a backward slice until the only free variables are values of the parameter types). All the solutions that can be transformed to match the signature are then tested using the given test cases, security constraints, and JML rules. Additional transformations can be applied based on the results of the test cases. The solutions that pass the test cases are then formatted according to the users' specified style, sorted by size, complexity, or performance on the test cases, and presented back to the user.

The system can be tried out (most of the time -- sometimes the server is down) at http://conifer.cs.brown.edu/s6.

Romanian translation of this page courtesy of Science Spaces

Papers

Semantics-Based Code Search, ICSE 2009, May 2009.

Specifying What to Search For, SUITE 2009, May 2009.

Images

Front end:

S6 front end image

 

Front end showing results:

front end with results

 

Diagram of the internals:

internal view

 

Software

The software is available at ftp://ftp.cs.brown.edu/u/spr/s6.tar.gz.