Can real-world programming problems be solved with cutting-edge AI? This month, DeepMind explored that question, confronting the world with a new perspective on programming, as well as the capabilities and limits of artificial intelligence.
But what’s equally interesting are the lessons they’ve learned along the way – about what can and can’t be automated, and about the errors in our current datasets.
And even if the AI-generated solutions were not better than the solutions of human programmers, this has already raised questions about what this means for the future.
—DagsHub (@TheRealDAGsHub) February 15, 2022
“A promising new competitor”
London-based DeepMind, the AI arm of Google parent company Alphabet, has already achieved historic milestones, outperforming humans playing chess and go, and also proving better at predicting how proteins behave. fall back.
This month, DeepMind announced that it had also developed a system called AlphaCode for entering programming competitions, evaluating its performance in 10 different programming contests organized by the competitive programming site CodeForces – each with at least 5,000 different participants.
The results? AlphaCode” placed roughly at the level of the median competitor,” reported a DeepMind blog post, “marking the first time an AI code generation system has reached a competitive level of performance in programming competitions.”
DeepMind pointed out that real-world companies use these competitions to recruit — and present similar issues to job applicants during coding interviews.
In the blog post, Mike Mirzayanov, founder of CodeForces, was quoted as saying that AlphaCode’s results exceeded his expectations. He added: “I was skeptical because even in simple competitive problems, it is often necessary not only to implement the algorithm but also (and this is the hardest part) to invent it.
“AlphaCode has succeeded in positioning itself at the level of a promising new competitor. I can’t wait to see what awaits me!”
A paper by DeepMinds researchers acknowledges that it took a huge amount of computing power. A petaFLOP means a whopping 1,000,000,000,000,000 floating point operations per second. A petascale day maintains this rate for every second of a 24-hour day, for a total of approximately 86,400,000,000,000,000,000 operations.
“Sampling and training from our model took hundreds of petaFLOPS days.”
A footnote added that Google’s data centers performing these operations “purchase renewable energy equal to the amount consumed.”
How AlphaCode Works
The researchers explain their findings in a 73 page paper (yet unpublished or peer reviewed). The authors write that their system was first “pre-trained” on code in the public GitHub repositories, much like the old AI-powered code suggestion tool Copilot. (To avoid some of the controversies that have arisen around Copilot’s methodology, AlphaCode filtered the datasets it trained on, selecting code that was released under permissive licenses.)
The researchers then “tuned” their system on a small dataset of competitive programming problems, solutions, and even test cases, many of which were pulled directly from the CodeForces platform.
One thing they discovered? There is a problem with the datasets currently available on programming contest problems and solutions. At least 30% of these programs pass all test cases, but are not actually correct.
So the researchers created a dataset that includes more test cases to rigorously check for correctness, and they believe this greatly reduces the number of incorrect programs that would still pass all tests – from 30% to just 4%.
When it’s time to finally tackle programming challenges, “we create a massive amount of C++ and Python programs for every problem,” said the DeepMind blog. “Then we filter, group and re-rank these solutions into a small set of 10 candidate programs which we submit for external review.”
“The problem-solving capabilities required to excel in these competitions exceed the capabilities of existing AI systems,” explained DeepMind’s blog post, attributing “large-scale progress transformer models (which have recently shown promising capabilities for generating code)” combined with “large-scale sampling and filtering”.
The blog post shows that the researchers’ findings demonstrate the potential of deep learning, even for tasks that require critical thinking – expressing solutions to problems as code. DeepMind’s blog post described the system as part of the company’s mission to “solve intelligence” (which, his website described as “developing more general and capable problem-solving systems” – also known as artificial general intelligence).
The blog post added: “[W]We hope our results will inspire the competitive programming community.
Human programmers react
DeepMind’s blog post also includes comments from petr mitrichchevidentified as both a Google software engineer and a “world-class” competitive programmer, who was impressed that AlphaCode could even make progress in this area.
“Solving competitive programming problems is a very difficult thing to do, requiring both good coding skills and creative problem-solving,” Mitrichev said.
Mitrichev also provided comments for six of the solutionsnoting that several submissions had also included “useless but harmless” bits of code.
In a submission, AlphaCode declared an integer variable named x – then never used it. In another submission browsing the chart, AlphaCode unnecessarily sorted all adjacent vertices first (how deep in the graph they will lead). For another problem (requiring a computationally intensive “brute force” solution), AlphaCode’s additional code made its solution 32 times slower.
In fact, AlphaCode often simply implemented a massive brute-force solution, Mitrichev wrote.
But the AI system itself failed like a programmer, Mitrichev noted, citing a submission where, when the solution eluded him, AlphaCode “behaves a bit like a desperate human”. He actually wrote some code that just always provides the same answer as provided in the problem’s sample scenario, he wrote, “hoping it works in all other cases.”
“Humans do it too, and such hope is almost always wrong – as it is in this case.”
AlphaCode as a dog speaking poor English https://t.co/WMq7oHNZ5s
— Hacker News (@newsycombinator) February 6, 2022
So, how good were the AlphaCode results? CodeForce calculates a programmer’s rating (using the standard Elo rating system also used to rank chess players) – and AlphaCode scored 1,238.
But what’s more interesting is where that rating appears on a graph of all the programmers competing on CodeForce over the past six months. The researchers’ paper noted that AlphaCode’s estimated rating “is in the top 28% among these users.”
Not everyone was impressed. Dzmitry BahdanauAI researcher and associate professor at McGill University in Montreal, highlighted on Twitter that many CodeForce participants are high school or college students – and the time constraints on their problem solving have less impact on a pre-trained AI system.
But more importantly, AlphaCode’s process involves filtering a torrent of AI-generated code to find one that actually solves the problem at hand, so “the vast majority of programs generated by AlphaCode are fake.”
So, while it’s a promising direction to explore, Bahdanau doesn’t think it’s a programming milestone: “It’s not AlphaGo in terms of beating humans and not AlphaFold in terms of revolutionizing an entire field of science. We have work to do.
AI is not coming for your work as a developer https://t.co/DCIkvqRfdL
— TNW (@thenextweb) February 14, 2022
But where does this lead? Just before the conclusion of their paper, the AlphaCode researchers added two sentences noting the dystopian possibility that code-generating capabilities “could lead to systems that can write themselves and improve recursively, rapidly leading to more and more advanced systems”.
Their article also mentions another disastrous possibility: “an increase in the supply and a decrease in the demand for programmers”.
Fortunately, there are already historical precedents for how this will play out, and the article claims that “previous instances of partial programming automation (e.g., compilers and IDEs) have only moved the programmers to higher levels of abstraction and opened the field to more people.”
For at least some programmers, this has already caused some concern. Recently a programming student on Hacker News complained about “AlphaCode Anxiety(as well as concerns about the GitHub co-driver). “Now I feel like I’m racing against time until the career I’ve worked so hard for gets automated,” the student wrote.
When a blog post on CodeForces said “The future has arrived”, a worried programmer even claimed that “there is a limit to what humans should automate”. The programmer added emphatically that the DeepMind developers who built AlphaCode “believe they are irreplaceable, but they would be the first to be replaced.”
But the fact that AlphaCode finished in the bottom half was also met with very human bashing.
“AI is such a noob,” replied the first commenter.