Ethereum Smart Contract Decompiler

Update: December 6:
– The full EVM decompiler will ship with JEB 3.0-beta.8
– Download a sample JEB Python script showing how to use the API
Update: November 20:
– The demo build JEB 3.0-beta.6.20181120 ships with important EVM decompiler updates – download/update here.
– We uploaded the decompiled code of an interested contract, the second part of the PolySwarm challenge (a good write-up can be found here)

We’re excited to announce that the pre-release of our Ethereum smart contract decompiler is available. We hope that it will become a tool of choice for security auditors, vulnerability researchers, and reverse engineers examining opaque smart contracts running on Ethereum platforms.

TL;DR: Download the demo build and start reversing contracts

Keep on reading to learn about the current features of the decompiler; how to use it and understand its output; its current limitations, and planned additions.

This opaque multisig wallet is holding more than USD $22 million as of 10/26/2018 (on mainnet, address 0x3DBB3E8A5B1E9DF522A441FFB8464F57798714B1)

Overall decompiler features

The decompiler modules provide the following specific capabilities:

    • The EVM decompiler takes compiled smart contract EVM 1 code as input, and decompiles them to Solidity-like source code.
    • The initial EVM code analysis passes determine contract’s public and private methods, including implementations of public methods synthetically generated by compilers.
    • Code analysis attempts to determine method and event names and prototypes, without access to an ABI.
  • The decompiler also attempts to recover various high-level constructs, including:
      • Implementations of well-known interfaces, such as ERC20 for standard tokens, ERC721 for non-fungible tokens, MultiSigWallet contracts, etc.
      • Storage variables and types
    • High-level Solidity artifacts and idioms, including:
        • Function mutability attributes
      • Function payability state
      • Event emission, including event name
      • Invocations of address.send() or address.transfer()
      • Precompiled contracts invocations

On top of the above, the JEB back-end and client platform provide the following standard functionality:

    • The decompiler uses JEB’s optimizations pipeline to produce high-level clean code.
    • It uses JEB code analysis core features, and therefore permits: code refactoring (eg, consistently renaming methods or fields), commenting and annotating, navigating (eg, cross references), typing, graphing, etc.
    • Users have access to the intermediate-level IR representation as well as high-level AST representations though the JEB API.
  • More generally, the API allows power-users to write extensions, ranging from simple scripts in Python to complex plugins in Java.

Our Ethereum modules were tested on thousands of smart contracts active on Ethereum mainnet and testnets.

Basic usage

Open a contract via the “File, Open smart contract…” menu entry.

You will be offered two options:

  • Open a binary file already stored on disk
  • Download 2 and open a contract from one of the principal Ethereum networks: mainnet, rinkeby, ropsten, or kovan:
    • Select the network
    • Provide the contract 20-byte address
    • Click Download and select a file destination
Open a contract via the File, Open smart contract menu entry

Note that to be recognized as EVM code, a file must:

  • either have a “.evm-bytecode” extension: in this case, the file may contain binary or hex-encoded code;
  • or have be a “.runtime” or “.bin-runtime” extension (as generated by the solc Solidity compiler), and contain hex-encoded Solidity generated code.

If you are opening raw files, we recommend you append the “.evm-extension” to them in order to guarantee that they will be processed as EVM contract code.

Contract Processing

JEB will process your contract file and generate a DecompiledContract class item to represent it:

The Assembly view on the right panel shows the processed code.

To switch to the decompiled view, select the “Decompiled Contract” node in the Hierarchy view, and press TAB (or right-click, Decompile).

Right-click on items to bring up context menus showing the principal commands and shortcuts.
The decompiled view of a contract.

The decompiled contract is rendered in Solidity-like code: it is mostly Solidity code, but not entirely; constructs that are illegal in Solidity are used throughout the code to represent instructions that the decompiler could not represent otherwise. Examples include: low-level statements representing some low-level EVM instructions, memory accesses, or very rarely, goto statements. Do not expect a DecompiledContract to be easily recompiled.

Code views

You may adjust the View panels to have side-by-side views if you wish to navigate the assembly and high-level code at the same time.

  • In the assembly view, within a routine, press Space to visualize its control flow graph.
  • To navigate from assembly to source, and back, press the TAB key. The caret will be positioned on the closest matching instruction.
Side by side views: assembly and source

Contract information

In the Project Explorer panel, double click the contract node (the node with the official Ethereum Foundation logo), and then select the Description tab in the opened view to see interesting information about the processed contract, such as:

  • The detected compiler and/or its version (currently supported are variants of Solidity and Vyper compilers).
  • The list of detected routines (private and public, with their hashes).
  • The Swarm hash of the metadata file, if any.
The contract was identified as being compiled with Solidity <= 0.4.21

Commands

The usual commands can be used to refactor and annotate the assembly or decompiled code. You will find the exhaustive list in the Action and Native menus. Here are basic commands:

  • Rename items (methods, variables, globals, …) using the N key
  • Navigate the code by examining cross-references, using the X key (eg, find all callers of a method and jump to one of them)
  • Comment using the Slash key
  • As said earlier, the TAB key is useful to navigate back and forth from the low-level EVM code to high-level decompiled code

We recommend you to browser the general user manual to get up to speed on how to use JEB.

Rename an item (eg, a variable) by pressing the N key

Remember that you can change immediate number bases and rendering by using the B key. In the example below, you can see a couple of strings present in the bad Fomo3D contract, initially rendered in Hex:

All immediates are rendered as hex-strings by default.
Use the B key to cycle through base (10, 16, etc.) and rendering (number, ascii)

Understanding decompiled contracts

This section highlights idioms you will encounter throughout decompiled pseudo-Solidity code. The examples below show the JEB UI Client with an assembly on the left side, and high level decompiled code on the right side. The contracts used as examples are live contracts currently active Ethereum mainnet.

We also highlight current limitations and planned additions.

Dispatcher and public functions

The entry-point function of a contract, at address 0, is generally its dispatcher. It is named start() by JEB, and in most cases will consist in an if-statement comparing the input calldata hash (the first 4 bytes) to pre-calculated hashes, to determine which routine is to be executed.

  • JEB attempts to determine public method names by using a hash dictionary (currently containing more than 140,000 entries).
  • Contracts compiled by Solidity generally use synthetic (compiler generated) methods as bridges between public routines, that use the public Ethereum ABI, and internal routines, using a compiler-specific ABI. Those routines are identified as well and, if their corresponding public method was named, will be assigned a similar name __impl_{PUBLIC_NAME}.

NOTE/PLANNED ADDITION: currently, JEB does not attempt to process input data of public routines and massage it back into an explicit prototype with regular variables. Therefore, you will see low-level access to CALLDATA bytes within public methods.

A dispatcher.

Below, see the public method collectToken(), which is retrieving its first parameter – a 20 byte address – from the calldata.

A public method reading its arguments from CALLDATA bytes.

Interface discovery

At the time of writing, implementation of the following interfaces can be detected: ERC20, ERC165, ERC721, ERC721TokenReceiver, ERC721Metadata, ERC721Enumerable, ERC820, ERC223, ERC777, TokenFallback used by ERC223/ERC777 interfaces, as well as the common MultiSigWallet interface.

Eg, the contract below was identified as an ERC20 token implementation:

This contract implements all methods specified by the ERC20 interface.

Function attributes

JEB does its best to retrieve:

  • low-level state mutability attributes (pure, read-only, read-write)
  • the high-level Solidity ‘payable’ attribute, reserved for public methods

Explicitly non-payable functions have lower-level synthetic stubs that verify that no Ether is being received. They REVERT if it is is the case. If JEB decides to remove this stub, the function will always have an inline comment /* non payable */ to avoid any ambiguity.

The contract below shows two public methods, one has a default mutability state (non-payable); the other one is payable. (Note that the hash 0xFF03AD56 was not resolved, therefore the name of the method is unknown and was set to sub_AF; you may also see a call to the collect()’s bridge function __impl_collect(), as was mentioned in the previous section).

Two public methods, one is payable, the other is not and will revert if it receives Ether.

Storage variables

The pre-release decompiler ships with a limited storage reconstructor module.

  • Accesses to primitives (int8 to int256, uint8 to uint256) is reconstructed in most cases
  • Packed small primitives in storage words are extracted (eg, a 256-bit storage word containing 2x uint8 and 1x int32, and accessed as such throughout the code, will yield 3 contract variables, as one would expect to see in a Solidity contract
Four primitive storage variables were reconstructed.

However, currently, accesses to complex storage variables, such as mappings, mappings of mappings, mappings of structures, etc. are not simplified. This limitation will be addressed in the full release.

When a storage variable is not resolved, you will see simple “storage[…]” assignments, such as:

Unresolved storage assignment, here, to a mapping.

Due to how storage on Ethereum is designed (a key-value store of uint256 to uint256), Solidity internally uses a two-or-more indirection level for computing actual storage keys. Those low-level storage keys depend on the position of the high level storage variables. The KECCAK256 opcode is used to calculate intermediate and final keys. We will detail this mechanism in detail in a future blog post.

Precompiled contracts

Ethereum defines four pre-compiled contracts at addresses 1, 2, 3, 4. (Other addresses (5-8) are being reserved for additional pre-compiled contracts, but this is still at the ERC stage.)

JEB identifies CALLs that will eventually lead to pre-compiled code execution, and marks them as such in decompiled code: call_{specific}.

The example below shows the __impl_Receive (named recovered) method of the 34C3 CTF contract, which calls into address #2, a pre-compiled contract providing a fast implementation of SHA-256.

This contract calls address 2 to calculate the SHA-256 of a binary blob.

Ether send()

Solidity’s send can be translated into a lower-level call with a standard gas stipend and zero parameters. It is essentially used to send Ether to a contract through the target contract fallback function.

NOTE: Currently, JEB renders them as send(address, amount) instead of address.send(amount)

The contract below is live on mainnet. It is a simple forwarder, that does not store ether: it forwards the received amount to another contract.

This contract makes use of address.send(…) to send Ether

Ether transfer()

Solidity’s transfer is an even higher-level variant of send that checks and REVERTs with data if CALL failed. JEB identifies those calls as well.

NOTE: Currently, JEB renders them as transfer(address, amount) instead of address.transfer(amount)

This contract makes use of address.transfer(…) to send Ether

Event emission

JEB attempts to partially reconstruct LOGx (x in 1..4) opcodes back into high-level Solidity “emit Event(…)”. The event name is resolved by reversing the Event method prototype hash. At the time of writing, our dictionary contains more than 20,000 entries.

If JEB cannot reverse a LOGx instruction, or if LOG0 is used, then a lower-level log(…) call will be used.

NOTE: currently, the event parameters are not processed; therefore, the emit construct used in the decompiled code has the following form: emit Event(memory, size[, topic2[, topic3[, topic4]]]). topic1 is always used to store the event prototype hash.

An Invocation of LOG4 reversed to an “emit Deposit(…)” event emission

API

JEB API allows automation of complex or repetitive tasks. Back-end plugins or complex scripts can be written in Python or Java. The API update that ship with JEB 3.0-beta.6 allow users to query decompiled contract code:

  • access to the intermediate representation (IR)
  • access to the final Solidity-like representation (AST)

API use is out-of-scope here. We will provide examples either in a subsequent blog post or on our public GitHub repository.

Conclusion

As said in the introduction, if you are reverse engineering opaque contracts (that is, most contracts on Ethereum’s mainnet), we believe you will find JEB useful.

You may give a try to the pre-release by downloading the demo here. Please let us know your feedback: we are planning a full release before the end of the year.

As always, thank you to all our users and supporters. -Nicolas

  1. EVM: Ethereum Virtual Machine
  2. This Open plugin uses Etherscan to retrieve the contract code

Android NDK Libraries Signatures

In this blog post, we present a new batch of native signatures released with JEB3 to identify Android Native Development Kit (NDK) libraries.

First, let’s briefly give some context. The Android NDK is a set of tools allowing developers to embed compiled C/C++ code into their Android applications. Thus, developers can integrate existing native code libraries, develop performance-sensitive code in C/C++ or obfuscate algorithms with native code protectors.

In practice, native code within Android applications comes in the form of ELF shared libraries (“.so”); the native methods can then be called from Java using Java Native Interface (JNI), which we described in a previous blog post.

NDK Pre-Built Libraries

Android NDK provides some pre-built libraries that can be linked against. For example, there are several C++ Standard Template Library (STL) 1 , or the Zlib decompression library.

As an example, let’s compile a “hello world” Android NDK C++ library with NDK r17. By default, the C++ implementation will be gnustl — the default choice before NDK r18.

Here is the C++ code:

When compiled with Android Studio’s default settings, libraries are linked dynamically, and libgnustl_shared.so is directly included in the application — because it is not a system library –, for each supported Application Binary Interface (ABI).

Files hierarchy of the Android application containing our “hello world” native library

If we open the ARM library we can pretty easily understand the — already convoluted — logic of our “hello world” routine, thanks to the names of gnustl external API calls:

Control-flow graph of ARM “hello world” with gnustl dynamically linked. Note that JEB displays mangled names when API calls correspond to external routines.

Now, Android NDK also provides static versions for most of the pre-built libraries. A developer — especially a malware developer wishing to hinder analysis — might prefer to use those.

When compiled in static mode, gnustl library is now ‘included’ in our native library, and here is our “hello world” routine:

Control-flow graph of ARM “hello world” with gnustl statically linked. Subroutines bear no specific names.

In this case, the analysis will be slowed down by the numerous routine calls with no specific names; each of this subroutine will need to be looked at to understand the whole purpose.

This brings us to a common reverse-engineering problem: is there a way to automatically identify and rename static library code, such that the analyst can focus on the application code?

JEB3 NDK Signatures

That’s when JEB native signatures come to the rescue! Indeed JEB3 now provides signatures for the following Android NDK  static libraries:

  • gnustl
  • libc++
  • STLport
  • libc
  • libmath
  • zlib

We provide signatures for ARM/ARM64 ABIs (including all variants like arm-v7a, arm-v7a-hard, thumb or ARM mode, etc) of these libraries, from NDK r10 to NDK r18.

These signatures are built in a similar fashion to our x86/x64 Visual Studio native signatures, and are intended to be “false-positive free”, which means a match should be blindly trustable. Note that JEB users can create their own signatures directly from the UI.

So, within JEB, if we open our statically-linked library with the signatures loaded, gnustl library routines are identified and renamed:

Control-flow graph of ARM “hello world” with gnustl statically linked and NDK signatures loaded. Subroutines have been renamed.

Note: the attentive reader might have noticed some “unk_lib_subX” routines in the previous image. Those names correspond to cases where several library routines match the routine. The user can then see the conflicting names in the target routine and use the most suitable one.

Due to the continuous evolution of compilers and libraries, it is not an easy task to provide up-to-date and useful signatures, but we hope this first NDK release will help our users. Nevertheless, more libraries should certainly be signed in the future, and we encourage users to comment on that  (email, Twitter, Slack).

  1.  NDK C++ support is a turbulent story, to say the least. Historically, different implementations of C++ have been provided with the NDK (gnustl, STLport, libc++,…), each of them coming with a different set of features (exceptions handling, RTTI…). Since the very recent r18 version (released in september 2018) Android developers must now use only libc++.