Code Synthesis Update 222 Nov 2016 | Max Johansen
Imagine being able to just tell a computer what you want it to do, rather than programming it and having to deal with annoying syntax, semicolons and debugging. For the last few months, we, Code Synthesis team, have been working on just that. This is similar to the process of automated theorem proving, the process of using computers to solve proofs, which has recently experienced some significant breakthroughs through the use of artificial neural networks. Automated theorem proving uses a description of the proof to derive a way to reach the desired end-product, This idea can be applied to our project where we use a description of a program to actually generate the code for that program.
To catch ourselves up with the state-of-the-art, we read the MIT Prophet paper. Prophet is a machine learning-based software package that suggests edits to programs based on bugfixes to similar errors in other software projects. The researchers used many standard machine learning techniques such as maximum likelihood estimation, which mathematically approximates the explanation of an observation.
We’ve also been researching the use of LSTMs, neural networks that essentially use “memories” of past experiences to solve problems, to use surrounding sentences to predict the intention. In addition, we’ve been investigating thought vectors, a way to turn natural language descriptions of snippets of code into mathematical objects.
We’ve been using a dataset drawn from one of Berkeley’s most popular introductory computer science courses, CS 61A, to analyze the mapping between assignment descriptions and student submissions. (We had to write a web scraper to obtain the assignment descriptions). Luckily, the course stores whether each student’s code submission passed the requirements of the problem so that we are able to identify changes that students made to their code to solve the problem. This theoretically allows us to implement Prophet-like code suggestion techniques in concert with natural language description of bugfixes.
Learned natural language manifolds, colored using various clustering techniques.
We plan to adopt extra heuristic analysis methods, or observations of the code execution process, as well as add thought vector processing to augment Prophet’s approach. We believe that much insight can be obtained by analyzing the mapping between natural language descriptions of the code and the actual code itself.