5 AI tools that can generate code to help programmers


One of the most recent advances in natural language processing (NLP) is the emergence of large language models (LLM) which are built using large datasets containing huge amounts of data. Several LLMs are available, such as Google’s BERT and OpenAI’s GPT-2 and GPT-3. With these models, it is possible to generate everything from simple tests to real financial models with these models.

AI startups including Open AI, hugging face, Join, AI21 Labs are pushing the limits of LLM by training models with billions of parameters.

Here are five AI-powered code generators based on the Great Language Models that can generate high-quality code:

1. OpenAI Codex

OpenAI Codex is the GPT-3 based model that powers GitHub co-pilot – a tool from Microsoft to generate code in the VS Code development environment. It claims to write code in at least a dozen languages, including JavaScript, Go, Perl, PHP, Ruby, Swift, and TypeScript, and even BASH. The model is trained on billions of lines of code available in the public domain, such as GitHub repositories.

OpenAI has made the model available through a private beta to developers and platform companies to build tools and integration.

2. Tabnine

While Tabnine is not an end-to-end code generator, it puts the auto-completion feature of the integrated development environment (IDE) on steroids. Developed in Rust by Jacob Jackson when he was a student at the University of Waterloo, Tabnine has evolved into a full-fledged AI-based code completion tool.

Tabnine supports over 20 languages ​​and 15 editors, including popular IDEs like VS Code, IntelliJ, Android Studio, and even Vim. It is available for $432 per year for a team of 3 developers.


CodeT5 is an open-source programming language model built by SalesForce researchers. It is based on Google’s T5 (Text-to-Text Transfer Transformer) framework. In order to train CodeT5, the team fetched over 8.35 million code instances, including user comments, from publicly available GitHub repositories. The majority of these datasets were derived from the CodeSearchNet dataset, which includes Ruby, JavaScript, Go, Python, PHP, C, and C#, in addition to two C and C# datasets from BigQuery.

CodeT5 can potentially bring three functionalities to software programming:

  • Text-to-code generation: generate code based on natural language description
  • Code auto-completion: completes all the function of the code according to the name of the target function
  • Code Summary: generate the summary of a function in description in natural language

4. Polycoder

polycoder is an open source alternative to the OpenAI Codex. Developed by researchers at Carnegie Mellon University, the model is based on OpenAI’s GPT-2, which is trained on a 249 GB codebase written in 12 programming languages. According to the authors of PolyCoder, the program is able to write C with greater precision than any other model, including Codex.

While most code generators are not open source, Polycoder is one of the premier open source code generation models.

5. Cogram

Cogram, a Berlin-based startup, Y-Combinator is a code generation tool for data scientists and Python programmers using SQL queries and Jupyter notebooks. Data scientists can write queries in English which the tool translates into complex SQL queries with joins and groupings. It supports SQLite, PostgreSQL, MySQL and Amazon Redshift.

Python and Julia developers can integrate Cogram with Jupyter Notebooks to automatically generate code. The tool can generate contextual code for a specific task based on feedback. Data scientists can even generate visualizations based on common Python modules such as Matplotlib, Plotly, or Seaborn.


Comments are closed.