Meta built a code-generating AI model similar to Copilot

Meta says it’s created a generative AI tool for coding similar to GitHub’s Copilot.

The company made the announcement at an event focused on its AI infrastructure efforts, including custom chips Meta’s building to accelerate the training of generative AI models. The coding tool, called CodeCompose, isn’t available publicly — at least not yet. But Meta says its teams use it internally to get code suggestions for Python and other languages as they type in IDEs like VS Code.

“The underlying model is built on top of public research from [Meta] that we have tuned for our internal use cases and codebases,” Michael Bolin, a software engineer at Meta, said in a prerecorded video. “On the product side, we’re able to integrate CodeCompose into any surface where our developers or data scientists work with code.”

The largest of several CodeCompose models Meta trained has 6.7 billion parameters, a little over half the number of parameters in the model on which Copilot is based. Parameters are the parts of the model learned from historical training data and essentially define the skill of the model on a problem, such as generating text.

CodeCompose was fine-tuned on Meta’s first-party code, including internal libraries and frameworks written in Hack, a Meta-developed programming language, so it can incorporate those into its programming suggestions. And its base training data set was filtered of poor coding practices and errors, like deprecated APIs, to reduce the chance that the model recommends a problematic slice of code.

In practice, CodeCompose makes suggestions like annotations and import statements as a user types. The system can complete single lines of code or multiple lines, optionally filling in entire large chunks of code. 

“CodeCompose can take advantage of the surrounding code to provide better suggestions,” Bolin continued. “It can also uses code comments as a signal in generating code.”

Meta claims that thousands of employees are accepting suggestions from CodeCompose every week and that the acceptance rate is over 20%.

The company didn’t address, however, the controversies around code-generating AI.

Microsoft, GitHub and OpenAI are being sued in a class action lawsuit that accuses them of violating copyright law by allowing Copilot to regurgitate sections of licensed code without providing credit. Liability aside, some legal experts have suggested that AI like Copilot could put companies at risk if they were to unwittingly incorporate copyrighted suggestions from the tool into their production software.

It’s unclear whether CodeCompose, too, was trained on licensed or copyrighted code — even accidentally. When reached for comment, a Meta spokesperson had this to say:

“CodeCompose was trained on InCoder, which was released by Meta’s AI research division. In a paper detailing InCoder, we note that, to train InCoder, ‘We collect a corpus of (1) public code with permissive, non-copyleft, open source licenses from GitHub and GitLab and (2) StackOverflow questions, answers and comments.’ The only additional training we do for CodeCompose is on Meta’s internal code.”

Generative coding tools can also introduce insecure code. According to a recent study out of Stanford, software engineers who use code-generating AI systems are more likely to cause security vulnerabilities in the apps they develop. While the study didn’t look at CodeCompose specifically, it stands to reason that developers who use it would fall victim to the same.

Bolin stressed that developers needn’t follow CodeCompose’s suggestions and that security was a “major consideration” in creating the model. “We are extremely excited with our progress on CodeCompose to date, and we believe that our developers are best served by bringing this work in house,” he added.

Source :