Control-flow unflattening in the wild

Both JEB decompiler engines 1 ship with code optimizers capable of rebuilding methods whose control-flow was transformed by flattening obfuscators.

Image © Tigress (University of Arizona)

Control-flow flattening, sometimes referred to as chenxification2, is an obfuscation technique employed to destructure a routine control-flow. While a compiled routine is typically composed of a number of basic blocks having low ingress and egress counts, a flattened routine may exhibit an outlier node having high input and high output edge counts, and generally, a very high centrality in the graph (in terms of vertex betweenness). Practically speaking, the original method M is reduced to a many-way conditional block H evaluating an expression VPC, dispatching the flow of execution to units of code, each one performing a part of M, updating VPC, and looping back to H. In effect, the original structured code is reduced to a large switch-like block, whose execution is guided by a synthetic variable VPC. Therefore, the original flow of control, critical to infer meaning while performing manual reverse-engineering, is lost. 3

We upgraded dexdec‘s control flow unflattener earlier this year. 4 The v2 of the unflattener is more generic than our original implementation. It is able to cover cases in which the obfuscated does not map to the clean model presented above, e.g. cases where the dispatcher stands out.

This week, we encountered an instance of code that was auto-deobfuscated to clean code and thought it’d be a good example to show how useful generic deobfuscation of such code can be. It seems that the obfuscator that was used to protect the original code was BlackObfuscator, a project used by clean apps and malware alike.

Hash: 92ae23580c83642ad0e50f19979b9d2122f28d8b3a9d4b17539ce125ae8d93eb

Before deobfuscation.

After deobfuscation, the code looks like:

After deobfuscation.

If you encounter examples where the unflattener does not perform adequately, please let us know. We’ll see if they can be fixed or upgraded to cover obfuscation corner-cases.

Thank you & until next time — Nicolas.

  1. dexdec is JEB’ dex/dalvik decompiler, gendec is JEB’s generic decompiler used for native code and any code other than dex/dalvik
  2. A term coined by University of Arizona’s Pr. Christian Collberg for the fact that an early description of this technique was presented by Dr. Chenxi Wang in her PhD thesis
  3. Control-flow flattening can be seen as a particular case of code virtualization, which was covered in previous blog entries.
  4. JEB 4.25 released on Jan 17 2023

Recovering JNI registered natives, recovering protected string constants

This is part 2 of the blog that introduced the major addition that shipped with JEB Pro 4.29: the ability for the dex decompiler to call into the native analysis pipeline, the generic decompiler and native code emulator.

Today, we demo how to use two plugins shipping with JEB 4.30, making use of the emulators to recover information protected by a native code library found in several APKs, libpairipcore.so.

Recovering statically registered native routines

The first plugin can be used to discover native routines registered via JNI’s RegisterNatives. As a reminder, when calling a native method from Java, the JNI will see if exported routines with specific names derived from the Java method signature exist in the process. Alternatively, bindings between a Java native method and its actual body can be done with RegisterNatives. Typically, this is achieved in JNI_OnLoad, the primary entry-point. However, it does not need to; other techniques exist to further obfuscate the target call site of a Java native method, such as unregistration/re-registration, the obfuscation of JNI_OnLoad, etc. More information can be found here.

In its current state, the plugin will attempt to emulate a SO library’s JNI_OnLoad on its own, without the context of the app process it would normally run on. The advantage is that the plugin is useable on libraries recovered without their container app (APK or else). The drawback is that it may fail in complex cases, since the full app context is not available to this plugin. (Note that the second plugin does not suffer this limitation).

Open an APK or Elf SO file(s), run the “Recover statically-registered natives (Android)” plugin.
Set optional name filters or architecture filters as needed.
The results will be visible in the log. In this case, it looks like the aarch64 library libpairipcore.so registered one method for com.pairip.VMRunner.executeVM, and mapped it to a routine at 0x5F180.

Recovering constants removed from the Dex

The second plugin makes use of an IEmulatedAndroid object to simulate an execution environment and execute code that may be restoring static string constants removed from the Dex by code protection systems.

We can imagine that the code protection pass works as such:

String constants are being removed during a protection pass.

The implementation details of restore() are not relevant to this blog entry. In the case of that particular app, it involves calling into a highly obfuscated native library called libpairipcore.so.

The plugin requires a full APK. It will emulate a static method selected by the user and let them know about the constants that were restored.

The plugin workflow is as follows:

After loading an APK, the plugin may let the user know that the code was protected.
Execute the “Recover removed Dex constants” plugin.
The user will be asked to input the no-arg static method that should be simulated. If a suitable one is found, it may be pre-populated by the plugin.
The execution can be lengthy, from several seconds to several minutes. Recovered strings are registered as fields comments as well as decompiler events in the relevant dexdec unit of your project.

Conclusion

That’s it for today. Make sure to update to JEB Pro 4.30 if you want to use those plugins.

I would encourage power-users to explore the JEB’s API, in particular IDState, EState/EEmulator and IEmulatedAndroid, if they want to experiment or work on code that requires specific hooks (dex hooks, jvm sandbox hooks, native emu hooks, native memory hooks – refer to the registerXxxHooks methods in IDState) for the emulators to operate properly.

Until next time — Nicolas.

Android JNI and Native Code Emulation

JEB 4.29 finally bridges the gap between the dex analysis modules in charge of code emulation (dexdec‘s IDState and co.) and their counterparts in the native code analysis pipeline (gendec‘s EEmulator, EState and co.).

The emulation of JNI routines from dexdec unlocks use-cases that are now becoming commonplace, such as:

  • Object consumption relying on native code calls to make reverse-engineering harder. The typical case is the retrieval of encrypted strings where part of the decryption code is bytecode, part is native code.
  • General app tweaking done on the native side, such as field setting, field reading, method invocation, object creation, etc.

Example

Here is an example of what could not be done by JEB <4.29:

//
// dex code:
//

package a.b;

class X {
  ...
  native String decrypt(char[] array, int key1, int key2);
  ...
  void f() {
    return decrypt(new char[]{'K', 'F', 'C'}, 4, 3);
  }
  ...
}

//
// native code:
//

// pseudo-code for method `dec` mapping to `a.b.X.decrypt`
jstring dec(JNIEnv* env, jobject this, jcharArray array, int a, int b) {
  int len = (*env)->GetArrayLength(env, array);
  uint16_t out[len];
  for(int i = 0; i < len; i++) {
    out[i] = array[i] - (a - b);
  }
  return (*env)->NewString(env, out, len);
}

JEB used to decompile X.f() to:

void f() {
  return decrypt(new char[]{'K', 'F', 'C'}, 4, 3);
}

JEB 4.29, if the native emulator is enabled, is able to return a simpler version:

void f() {
  return "JEB";
}

Preparation

Currently, the native emulator is disabled by default. In order to let dexdec use it, edit your dexdec-emu.cfg file (located in your coreplugins/ folder, or in the GUI, Android menu, handler Emulator Settings…):

  • Mandatory: set enable_native_code_emulator to true
  • Recommended: increase the values of emu_max_duration and emu_max_itercount (the reason being the the analysis of native images by the native code plugins can be quite time-consuming).

You will also need a JEB Pro license to use this feature.

Output

As usual, the auto-decryption of an item will also emit an event, which can be collected programmatically, and visible in the Decompiler’s “Events” fragment in the GUI.

Items whose address is formatted as @LIB:<lib.so>@NativeAddress are decrypted native items that were found in the SO image at some point.

Decrypted strings collected by the decompiler

Similarly, decrypted items found in decompiled code are rendered using a purple’ish pink (by default) in the GUI.

If native code was involved in the decryption, the on-hover pop-up will let you know:

Decryption of that string required emulation of native code

API

The native emulator(s) managed by a dexdec‘s IDState can be customized with the following newly-added methods and types:

  • enableNativeCodeEmulator / isNativeCodeEmulatorEnabled : enable or disable the native emulator (the master setting is pulled from your config file, dexdec-emu.cfg)
  • registerNativeEmulatorHooks / unregisterNativeEmulatorHooks : hooks into the evaluation (emulation) of the native code – refer to the appropriate hooks interfaces. The hooks receives a reference to the controlling EEmulator.
  • unregisterNativeEmulatorHooks / ununregisterNativeEmulatorHooks : hooks into the memory accesses of the emulator’s state – refer to the appropriate hooks interfaces. The hooks receives a reference to the target EState object.

Conclusion

Interfacing both emulators offers many possibilities to improve the reverse-engineering experience of complex binaries and applications.

There is more that can be done, which will be discussed further blog posts:

  • Retrieval of statically registered natives (through JNIEnv’s RegisterNatives) as opposed to native routines automatically resolved using the JNI naming conventions.
  • Automatic unpacking of native code.
  • Use of the native emulator in custom scripts and plugins.

Note that this feature is currently limited to JEB Pro.

The JNI native code emulator will work with x86, x64, and arm64 code (we may add support for arm in the near future). Needless to say, it is still in experimental mode! Therefore, you may encounter strange results or problems while analyzing code making use of it. Please send us error reports to support@pnfsoftware.com.

Until next time, and once again, thank you to our amazing users for their continued support and kind words 🙂 — Nicolas.