Image credit: Ghost in the Shell (2017)

Alexander Vipond

OpenAI, a New York based AI research organisation, has created a new text generator called GPT-2. The transformer based AI language modelling system is capable of generating large amounts of high-quality synthetic text, perform reading comprehension, machine translation and summarisation in seven languages.

GPT-2 is designed to predict the next words that come from a text sample. Trained on 8 million webpages from the internet, it utilises zero-shot learning, which means that it can extrapolate short text samples into full articles based on entirely new information

As a general-purpose system, GPT-2 has been able to beat several records held by different domain specific language modelling systems and significantly lowers the complexity and cost of deploying scalable and customisable text generation.

Here’s an example:

Sample prompt: A train carriage containing controlled nuclear materials was stolen in Cincinnati today. Its whereabouts are unknown.

Model Completion (machine-written, first try)

The incident occurred on the downtown train line, which runs from Covington and Ashland stations.

In an email to Ohio news outlets, the U.S. Department of Energy said it is working with the Federal Railroad Administration to find the thief.

“The theft of this nuclear material will have significant negative consequences on public and environmental health, our workforce and the economy of our nation,” said Tom Hicks, the U.S. Energy Secretary, in a statement. “Our top priority is to secure the theft and ensure it doesn’t happen again.”

The stolen material was taken from the University of Cincinnati’s Research Triangle Park nuclear research site, according to a news release from Department officials.

The Nuclear Regulatory Commission did not immediately release any information.

According to the release, the U.S. Department of Energy’s Office of Nuclear Material Safety and Security is leading that team’s investigation.

“The safety of people, the environment and the nation’s nuclear stockpile is our highest priority,” Hicks said. “We will get to the bottom of this and make no excuses.”

The system constructs a story, creating fake institutions, places and quotations; blending them to produce the illusion of authenticity. The model isn’t perfect and it may take several tries to get a human quality tranche of text but the speed and scale of the system are such that OpenAI has withheld the full system from publication.

This is because GPT-2 could be used by malicious actors to commit large scale information warfare and undermine the one of the central principles of the cybersecurity triad: the integrity of information.

OpenAI has acknowledged this threat, citing GPT-2’s capacity to generate fake news en masse, impersonate others, automate fake content, phishing and spam. Through tweaking the system one can produce infinite positive or negatively angled articles. It is also possible one could customise it for specific issues to improve the veracity of supporting information in the synthetic content it produces, making it all the more difficult to tell fact from fiction. OpenAI have stated they expect tools like GPT-2 to be available in the next two years.

As dictatorships and authoritarian regimes actively seek to spread misinformation to disrupt elections, obfuscate wars, and insist assassins prefer to spend their time admiring English churches, GPT-2 is a highly attractive tool and a warning of what’s to come.

The malicious use of AI tools will challenge the integrity of the global digital commons, fuelled by states who view the open flow of information as a threat to their governance. The tools will then be passed down to organised crime and developing regimes. As the recent case of Project Raven shows, even as countries are increasingly trying to secure their intellectual property; their cyber tools and tactics are up for sale.

As William Gibson once said “the future is already here, it’s just unevenly distributed”. So now that we know the threat is here, what can we do to counter the risks at the different levels of its distribution?

OpenAI will continue their research.