Skip to main content

Researchers propose AI system that reverse-engineers black box apps

lock on gray background

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.


A preprint paper published by researchers at DeepMind and the CISPA Helmholtz Center for Information Security describes an AI system capable of reverse-engineering the black box functions of programs written in educational programming language Karel. Given access only to the inputs and outputs (I/Os) of an application, they claim the system — dubbed IReEn — can iteratively improve a copy of the target application until it becomes functionally equivalent to the original.

Reverse-engineering might carry a nefarious connotation in some circles, but it isn’t without legitimate applications. For instance, it can help recover software if the source code was lost or aid in the detection and neutralization of malware. Although several machine learning-driven reverse-engineering techniques have been proposed, most can’t recover functional and human-interpretable forms of programs. But IReEn can.

IReEn obtains a set of I/Os by querying the target program’s functions using random inputs drawn from a distribution. A module called a neural program synthesizer — conditioned on the obtained I/Os — outputs clone programs and uses a scoring system to rate the clones in terms of closeness to the original. If the best candidate doesn’t cover all of the I/Os, the system selects a subset of I/Os that weren’t covered by the best candidate and conditions them on a program synthesizer for the next iteration.

AI reverse-engineering

Above: Revealing the functionality of a black box function using only input-output interactions.

To evaluate their approach, the coauthors considered Karel, which uses structures that make reverse-engineering applications programmed in it a challenge. Using an open source data set containing over 1.1 million pairs of I/Os and programs, they trained the neural synthesizer, reserving a subset of data for validation and testing.

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.
Request an invite

The team reports that when applied to 100 I/Os that weren’t included in the training data, IReEn generated functionally equivalent programs with a 78% success rate. “In contrast to prior work, we propose an iterative neural program synthesis scheme [that] is the first to tackle the task of reverse-engineering in a black box setting without any access to privileged information,” they wrote. “Despite the weaker assumptions, and hence the possibility to use our method broadly in other fields, we show that in many cases it is possible to reverse-engineer functionally equivalent programs on the Karel data set benchmark.”

The work partly builds on Nvidia’s GameGAN, which can synthesize a functional version of a game without an underlying engine. Given sequences of frames from a game and the corresponding actions taken by agents (i.e., players) within the game, the system visually imitates the game using a trained AI model.

VB Daily - get the latest in your inbox

Thanks for subscribing. Check out more VB newsletters here.

An error occured.