DeepMind says its new code generation system is competitive with human programmers

0

Last year, San Francisco-based research lab OpenAI released Codex, an AI model for translating natural language commands into application code. The model, which powers GitHub’s Copilot feature, was then touted as one of the most powerful examples of machine programming, the class of tools that automates software development and maintenance.

Not to be outdone, DeepMind — the artificial intelligence lab backed by Google’s parent company Alphabet — claims to have improved Codex in key areas with AlphaCode, a system capable of writing “competition-level” code. “. In programming contests hosted on Codeforces, a programming contest platform, DeepMind claims that AlphaCode has achieved an average top 54.3% ranking across 10 recent contests with over 5,000 entrants each.

Oriol Vinyals, senior researcher at DeepMind, says this is the first time a computer system has reached such a competitive level in all programming competitions. “AlphaCode [can] read natural language descriptions of an algorithmic problem and produce code that not only compiles, but is correct,” he added in a statement. “[It] indicates that there is still work to be done to reach top performers and advance the problem-solving capabilities of our AI systems. We hope this benchmark will lead to further innovations in problem solving and code generation.

Learn to code with AI

Machine programming has been supercharged by AI over the past few months. At its Build developer conference in May 2021, Microsoft detailed a new feature in Power Apps that leverages OpenAI’s GPT-3 language model to help users choose formulas. Intel’s ControlFlag can autonomously detect errors in code. And Facebook’s TransCoder converts code from one programming language to another.

The applications are vast, which is why there is a rush to create such systems. According to a Cambridge University study, at least half of developers’ efforts are spent on debugging, costing the software industry an estimated $312 billion a year. AI-powered code suggestion and review tools promise to cut development costs while allowing coders to focus on creative, less repetitive tasks — assuming systems work as advertised.

Like Codex, AlphaCode – whose largest version contains 41.4 billion parameters, roughly four times the size of Codex – was trained on a snapshot of public repositories on GitHub in the programming languages ​​C++, C#, Go, Java, JavaScript, Lua, PHP, Python, Ruby, Rust, Scala and TypeScript. AlphaCode’s training dataset was 715.1 GB, about the same size as Codex’s, which OpenAI estimated at “over 600 GB”.

An example of the interface used by AlphaCode to meet programming challenges.

In machine learning, parameters are the part of the model that is learned from historical training data. Overall, the correlation between number of parameters and sophistication held up remarkably well.

Architecturally, AlphaCode is what is known as a Transformer-based language model, similar to Salesforce’s CodeT5 code generator. The Transformer architecture is made up of two main components: an encoder and a decoder. The encoder contains layers that process input data, such as text and images, iteratively layer by layer. Each encoder layer generates encodings with information about which parts of the inputs are relevant to each other. They then pass these encodings to the next layer before reaching the final encoder layer.

Creation of a new benchmark

Transformers typically undergo semi-supervised learning which involves unsupervised pre-training, followed by supervised fine-tuning. Situated between supervised and unsupervised learning, semi-supervised learning accepts partially labeled data or where the majority of the data lacks labels. In this case, the transformers are first subjected to “unknown” data for which there are no previously defined labels. During the fine-tuning process, processors practice on labeled data sets so they learn how to perform particular tasks such as answering questions, analyzing sentiment, and paraphrasing documents.

In the case of AlphaCode, DeepMind refined and tested the system on CodeContests, a new dataset created by the lab that includes issues, solutions, and test cases pulled from Codeforces with public programming datasets mixed together . DeepMind also tested the highest performing version of AlphaCode. – a set of the 41 billion parameter model and a 9 billion parameter model – on real programming tests on Codeforces, running AlphaCode live to generate solutions for each problem.

On CodeContests, given up to one million samples per issue, AlphaCode resolved 34.2% of issues. And on Codeforces, DeepMind claims it was among the 28% of users who entered a contest in the last six months in terms of overall performance.

“DeepMind’s latest paper is yet another impressive feat of engineering that shows there are still impressive gains to be made from our current Transformer-based models with ‘just’ the right sampling and training adjustments. and no fundamental changes in the architecture of the model,” Connor Leahy, a member of open AI research effort EleutherAI, told VentureBeat via email. “DeepMind brings out the full toolkit of tweaks and best practices using clean data, big models, a whole suite of smart training tricks, and of course, lots of math. DeepMind pushed the performance of these patterns much faster than I expected. The result of 50th percentile competitive programming is a huge leap forward, and their analysis clearly shows that it’s not “just memorization.” in coding models, from GPT3 to codex to AlphaCode, were really incredibly fast.”

Limitations of code generation

Machine programming is by no means a solved science, and DeepMind admits that AlphaCode has its limits. For example, the system does not always produce syntactically correct code for every language, especially C++. AlphaCode is also less good at generating difficult code, such as that required for dynamic programming, a technique for solving complex mathematical problems.

AlphaCode can also be problematic in other ways. Although DeepMind has not probed the model for bias, code generation models, including Codex, have been shown to amplify toxic and flawed content in training datasets. For example, Codex may be instructed to write “terrorist” when given the word “Islam” and generate code that appears to be superficially correct but poses a security risk by invoking compromised software and using untrusted configurations. secured.

Systems like AlphaCode – which, it should be noted, are expensive to produce and maintain – could also be misused, as recent studies have explored. Researchers from Booz Allen Hamilton and EleutherAI have trained a language model called GPT-J to generate code capable of solving introductory computer science exercises, successfully bypassing widely used programming plagiarism detection software. used. At the University of Maryland, researchers have found that it is possible for current language models to generate false cybersecurity reports that are convincing enough to fool the best experts.

The question remains whether malicious actors will use these types of systems in the future to automate the creation of large-scale malware. For this reason, Mike Cook, an AI researcher at Queen Mary University of London, disputes the idea that AlphaCode is bringing the industry closer to “problem-solving AI”.

“I think this result is not too surprising given that text comprehension and code generation are two of the four big tasks in which AI has shown improvements in recent years… A challenge with this area is that outputs tend to be quite susceptible to failure, one wrong word, pixel, or musical note in an AI-generated story, artwork, or melody might not ruin everything. for us, but one missed test case in a program can bring down space shuttles and destroy economies,” Cook told VentureBeat via email. people who don’t know how to program is exciting, we have a lot of problems to solve before we get to that.”

If DeepMind can solve these problems – and that’s a big if – he is supposed to make a comfortable profit in a constantly growing market. Among the practical areas the lab has recently tackled with AI, such as weather forecasting, materials modeling, atomic energy calculation, application recommendations and data center cooling optimization, programming is among the most lucrative. Even migrating an existing codebase to a more efficient language like Java or C++ commands a princely sum. For example, the Commonwealth Bank of Australia spent approximately $750 million over five years to convert its platform from COBOL to Java.

“I can safely say that the results of AlphaCode exceeded my expectations. I was skeptical because even in simple concurrency problems, it’s often necessary not only to implement the algorithm, but also (and this is the hardest part) to invent it,” said the founder of Codeforces Mike Mirzayanov in a statement, “AlphaCode has successfully leveled itself with a promising new competitor. I can’t wait to see what lies ahead.”

VentureBeat’s mission is to be a digital public square for technical decision makers to learn about transformative enterprise technology and conduct transactions. Learn more

Share.

Comments are closed.