Using Codeless Native Signatures

One of the new exciting features coming with JEB 4.0 is a set of signatures to identify common native libraries in a compiler-agnostic fashion.

These “codeless” signatures were built to tackle an old reverse-engineering problem: the identification of common open-source libraries in executables. Because such libraries are compiled by the developers themselves, traditional code-based signatures — like our own SigLib — need to be re-generated with the same compiler setup than the developers, otherwise the signatures won’t match because the code differs.

Therefore, identifying open-source libraries with code-based signatures is a lots of effort for a small return, because each set of signatures only matches one compiler setup (compiler’s version, optimization level…), and there is a vast number of them!

We developed codeless signatures to identify open-source libraries without the burden of signatures re-generation for each compiler setup. We are currently shipping signatures for the following libraries:

  • OpenSSL, versions 0.9.8m to 1.1.1g
  • libcurl, versions 7.30.0 to 7.71.1
  • libssh2, versions 1.8.0, 1.8.2 and 1.9.0
  • bzip2, versions 1.0.6 and 1.0.8
  • zlib, versions 1.2.3, 1.2.8, 1.2.10 and 1.2.11

The signatures can be applied on any binary opened in JEB, through “Native > Codeless Signatures Libraries” menu.

We also ship an automatic library version identification tool (available from “Codeless Signature Libraries” dialog), which should help to decide which versions of the library was linked, when it not obvious.

In order to build such signatures we made some tradeoffs, notably by accepting to miss some routines and to have a few false positives. We believe JEB’s codeless signatures are particularly suitable when one is not interested into library’s internals, and therefore the only library routines whose name really matter are the ones used by the rest of the code (like when doing malware analysis).

Overall, our current experiments show promising results, for example we usually identify 50-60% of OpenSSL routines, with a false positives ratio of less than 2%, on a variety of architecture/compiler setups.

We will describe in details the internals of JEB’s codeless signatures in an upcoming whitepaper, but in the meantime we made a video to demonstrate how to use them:

We really encourage you to test JEB’s codeless signatures and report feedback through the usual channels:

JEB’s GENDEC IR Emulation for Auto-Decryption of Data Items

Under some circumstances, JEB’s generic decompiler is able to detect inline decryptors, and subsequently attempt to emulate the underlying IR to generate plaintext data items, both in the disassembly view and, most importantly, decompiled views.1

This feature is available starting with JEB 4.0.3-beta. It makes use of the IREmulator object, available in the public API for scripting and plugins.

Here’s an example of a protected elf file2 (aarch64) that was encountered a few months ago:

Disassembly of the target routine

GENDEC’s unsafe optimizers are enabled by default. Let’s disable them before performing a first decompilation, in order to see what the inline decryptor looks like.

To bring up the decompilation options on-demand, use CTRL+TAB (or Command+TAB), or alternatively, menu Action, command Decompile with Options
Decompilation #1: unsafe optimizers disabled

That decryptor’s control flow is obfuscated (flattened, controlled by the state variable v5). It is called once, depending on the boolean value at 0x2F227. Here, the decrypted contents is used by system_property_get.

Below, the contents in virtual memory, pre-decryption:

Encrypted contents.

Let’s perform another decompilation of the same routine, with the unsafe optimizers enabled this time. GENDEC now will:

  • detect something that potentially could be decryption code
  • start emulating the underlying IR (not visible here, but you can easily read/write the Intermediate Representation via API) portion of code is emulated
  • collect and apply results

See the decrypted contents below. (An data item existed beforehand at 0x2F137, and the decompiler chose not to erase it.) The decompiled code on the right panel no longer shows the decryption loop: an optimizer has discarded it since it can no longer be executed.

Decompilation #2: unsafe optimizers enabled

We may convert the data item (or bytes) to a string by pressing the A key (menu Native, command Create String). The decompiled code will pick it up and refresh the AST as well.

The final result looks like:

The VM and decompiled view show the decrypted code, “”

A few additional comments:

  • This optimizer is considered unsafe3 because it is allowed to modify the VM of the underlying native code unit, as seen above.
  • The optimizer is generic (architecture-agnostic). It performs its work on the underlying IR mid-stage in the decompilation pipeline, when various optimizations are applied.
  • It makes use of public API methods only, mostly the IREmulator class. Advanced users can write similar optimizers if they choose to. (We will also publish the code of this optimizer on GitHub shortly, as it will serve as a good real-life example of how to use the IR emulator to write powerful optimizers. It’s slightly more than 100 lines of Java.)

We hope you enjoy using JEB 4 Beta. There is a license type for everyone, so feel free to try things out. Do not hesitate to reach out to us on Twitter, Slack, or privately over email! Thanks, and until next time 🙂

  1. Users familiar with JEB’s Dex decompilers will remember that a similar feature was introduced to JEB 3 in 2020, for Android Dalvik code.
  2. sha256 43816c47315aab27e50e6f895774a7b86d591807179e1d3262446ab7d68a56ef also available as lib/arm64-v8a/ in 309d848275aa128ebb7e27e570e5a2876977122625638630a6c61f7434b771c3
  3. “unsafe” in the context of decompilation; unsafe here is not to be understood as, “could any code be executed on the machine”, etc.