Reversing DexGuard

GuardSquare’s DexGuard, the commercial offshoot of ProGuard geared toward Android app protection, has grown significantly since its introduction in 2013. Technically, what started as “ProGuard + basic string encryption + code reflection” tool evolved into a multi-platform, complex solution including: control-flow obfuscation, complex and varied data and resources encryption, bytecode encryption, virtual environment and rooted system detection, application signature and certificate pinning enforcement, native code protection, as well as bytecode virtualization 1, and more.

GuardSquare seems to be keeping a tight lid on their product’s specs, as their website does not provide public datasheet or white papers. In this article, we present the result of reverse-engineering a high-profile app protected with DexGuard, which will allow us to infer a subset of DexGuard’s capabilities.

This article presents several bytecode obfuscation used by DexGuard, as well as DexGuard facility made available at runtime to protected programs 2. The analysis that follows was done statically, with JEB 3.20.

Identification

Identifying apps protected by DexGuard is relatively easy. It seems the default bytecode obfuscation settings place most classes in the o package, and some will be renamed to invalid names on a Windows system, such as con or aux. That being said, DexGuard’ed apps may not be as easily identified. Closer inspection of the code will reveal stronger hints than obfuscated names: decryption stubs, specific encrypted data, the presence of some so library files, are all tell tale signs of DexGuard, as shown below.

Running a Global Analysis

Let’s run a Global Analysis (menu Android, Global analysis…) with standard settings on the file and see what gets auto-decrypted and auto-unreflected:

Results subset of Global Analysis (redacted areas are meant to keep the analyzed program anonymous; it is a clean app, whose business logic is irrelevant to the analysis of DexGuard)

Lots of strings were decrypted, many of them specific to the app’s business logic itself, others related to DexGuard RASP – that is, library code embedded within the APK, responsible for performing app signature verification for instance, which is itself protected by DexGuard. That gives us valuable pointers into where we should be looking at if we’d like to focus on the protection code specifically.

Deobfuscating Code

The first section of this blog focuses on bytecode obfuscation and how JEB deals with it. It is mostly automated, but a final step requires manual assistance to achieve the best results.

Most obfuscated routines exhibit the following characteristics:

  • Dynamically generated strings via the use of per-class decryption routines
  • Most calls to external routines are done via reflection
  • Flow obfuscation via the use of a couple of opaque integer fields – let’s call them OPI0, OPI1. They are class fields generally initialized to 0 and 1.
  • Arithmetic operation obfuscation
  • Garbage code insertion
  • Unusual protected block structure, leading to fragmented try-blocks, unavoidable to produce semantically accurate raw code

As an example, the following class is used to perform app certificate validation in order, for instance, to prevent resigned apps from functioning. A few items were renamed for clarity; decompilation is done with disabled Deobfuscators (MOD1+TAB, untick “Enable deobfuscators”):

Take #1 (snippet) – The protected class is decompiled without deobfuscation in order to show semi-raw output (a few optimizers doing all sort of code cleanup are not categorized as deobfuscators internally, and will perform even if Deobfuscation is disabled). Note that a few items were also renamed for clarity.

In practice, such code is quite hard to comprehend on complex methods. With obfuscators enabled (the default setting), most of the above will be cleared.

See the re-decompilation of the same class, below.

  • strings are decrypted…
  • …enabling unreflection
  • most obfuscation is removed…
  • except for some control flow obfuscation that remains because JEB was unable to process OPI0/OPI1 directly (below,
Take #2 (full routine) – obfuscators enabled (default). The red blocks highlight use of opaque variables used to obfuscate control flow.

Let’s give a hint to JEB as to what OPI0/OPI1 are.

  • When analyzing DexGuard’ed apps, you can rename OPI0 and OPI1 to guard0 and guard1, respectively, to allow JEB go aggressively clean the code
  • Redecompile the class after renaming the fields
Take #3 (full routine) – with explicit guard0/guard1

That final output is clean and readable.

Other obfuscation techniques not exposed in this short routine above are arithmetic obfuscation and other operation complexification techniques. JEB will seamlessly deal with many of them. Example:

is optimized to

To summarize bytecode obfuscation as of DexGuard 8.4:

  • decryption and unreflection is done automatically 3
  • garbage clean-up, code clean-up is also generic and done automatically
  • control flow deobfuscation needs a bit of guidance to operate (guard0/guard1 renaming)

Runtime Verification

RASP library routines are used at the developers’ discretion. They consist of a set of classes that the application code can call at any time, to perform tasks such as:

  • App signing verification
  • Debuggability/debugger detection
  • Emulator detection
  • Root detection
  • Instrumentation toolkits detection
  • Certificate pinning
  • Manifest check
  • Permission checks

The client decides when and where to use them as well as what action should be taken on the results. The code itself is protected by DexGuard, that goes without saying.

App Signing Verification

  • Certificate verification uses the PackageManager to retrieve app’s signatures: PackageManager.getPackageInfo(packageName, GET_SIGNATURES).signatures
  • The signatures are hashed and compared to caller-provided values in an IntBuffer or LongBuffer.

Debug Detection

Debuggability check

The following checks must pass:

  • assert that Context.ctx.getApplicationInfo().flags & ApplicationInfo.FLAG_DEBUGGABLE is false
  • check the ro.debuggable property, in two ways to ensure consistency
    • using android.os.SystemProperties.get() (private API)
    • using the getprop‘s binary
  • verify that no hooking framework is detected (see specific section below)

Debugging session check

The following checks must pass:

  • assert that android.os.Debug.isDebuggerConnected() is false
  • verify no tracer process: tracerpid entry in /proc/<pid>/status must be <= 0
  • verify that no hooking framework is detected (see specific section below)

Debug key signing

  • enumerate the app’s signatures via PackageInfo.signatures
  • use getSubjectX500Principal() to verify that no certificate has a subject distinguished name (DN) equals to "CN=Android Debug,O=Android,C=US", which is the standard DN for debug certificates generated by the SDK tools

Emulator Detection

Emulator detection is done by checking any of the below.

1) All properties defined in system/build.prop are retrieved, hashed, and matched against a small set of hard-coded hashes:

86701cb958c69d64cd59322dfebacede -> property ???
19385aafbb452f39b5079513f668bbeb -> property ???
24ad686ec83d904347c5a916acbe1779 -> property ???
b8c8255febc6c46a3e43b369225ded3e -> property ???
d76386ddf2c96a9a92fc4bc8f829173c -> property ???
15fed45d5ca405da4e6aa9805daf2fbf -> property ??? (unused)

Unfortunately, we were not able to reverse those hashes back to known property strings – however, it was tried only on AOSP emulator images. If anybody wants to help and run the below on other build.prop files, feel free to let us know what property strings those hashes match to. Here is the hash verification source, to be run be on build.prop files.

2) The following file is readable:

/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq

3) Verify if any of those qemu, genymotion and bluestacks emulator files exist and are readable:

/dev/qemu_pipe
/dev/socket/baseband_genyd
/dev/socket/genyd
/dev/socket/qemud
/sys/qemu_trace
/system/lib/libc_malloc_debug_qemu.so
/dev/bst_gps
/dev/bst_time
/dev/socket/bstfolderd
/system/lib/libbstfolder_jni.so

4) Check for the presence of wired network interfaces: (via NetworkInterface.getNetworkInterfaces)

eth0
eth1

5) If the app has the permission READ_PHONE_STATE, telephony information is verified, an emulator is detected if any of the below matches (standard emulator image settings):

- "getLine1Number": "15555215554", "15555215556", "15555215558", "15555215560", "15555215562", "15555215564", "15555215566", "15555215568", "15555215570", "15555215572", "15555215574", "15555215576", "15555215578", "15555215580", "15555215582", "15555215584"
- "getNetworkOperatorName": "android"
- "getSimSerialNumber": "89014103211118510720"
- "getSubscriberId": "310260000000000"
- "getDeviceId": "000000000000000", "e21833235b6eef10", "012345678912345"

6) /proc checks:

/proc/ioports: entry "0ff :" (unknown port, likely used by some emulators)
/proc/self/maps: entry "gralloc.goldfish.so" (GF: older emulator kernel name)

7) Property checks (done in multiple ways with a consistency checks, as explained earlier), failed if any entry is found and start with one of the provided values:

- "ro.product.manufacturer": "Genymotion", "unknown", "chromium"
- "ro.product.device": "vbox86p", "generic", "generic_x86", "generic_x86_64"
- "ro.product.model": "sdk", "emulator", "App Runtime for Chrome", "Android SDK built for x86", "Android SDK built for x86_64"
- "ro.hardware": "goldfish", "vbox86", "ranchu"
- "ro.product.brand": "generic", "chromium"
- "ro.kernel.qemu": "1"
- "ro.secure": "0"
- "ro.build.product": "sdk", "vbox86p", "full_x86", "generic_x86", "generic_x86_64"
- "ro.build.fingerprint": "generic/sdk/generic", "generic_x86/sdk_x86/generic_x86", "generic/google_sdk/generic", "generic/vbox86p/vbox86p", "google/sdk_gphone_x86/generic_x86"
- "ro.bootloader": "unknown"
- "ro.bootimage.build.fingerprint": "Android-x86"
- "ro.build.display.id": "test-"
- "init.svc.qemu-props" (any value)
- "qemu.hw.mainkeys" (any value)
- "qemu.sf.fake_camera" (any value)
- "qemu.sf.lcd_density" (any value)
- "ro.kernel.android.qemud" (any value)

Hooking Systems Detection

The term covers a wide range of techniques designed to intercept regular control flow in order to examine and/or modify execution.

1) Xposed instrumentation framework detection, by attempting to load any of the classes:

de.robv.android.xposed.XposedBridge
de.robv.android.xposed.XC_MethodHook

Class loading is done in different ways in an attempt to circumvent hooking itself, using Class.forName with a variety of class loaders, custom class loaders and ClassLoader.getLoadedClass, as well as lower-level private methods, such as Class.classForName.

2) Cydia Substrate instrumentation framework detection.

3) ADBI (Android Dynamic Binary Instrumentation) detection

4) Stack frame verification: an exception is generated in order to retrieve a stack frame. The callers are hashed and compared to an expected hard-coded value.

5) Native code checks. This will be detailed in another blog, if time allows.

Root Detection

While root detection overlaps with most of the above, it is still another layer of security a determined attacker would have to jump over (or walk around) in order to get protected apps to run on unusual systems. Checks are plenty, and as is the case for all the code described here, heavily obfuscated. If you are analyzing such files, keeping the Deobfuscators enabled and providing guard0/guard1 hints is key to a smooth analysis.

Static initializer of the principal root detection class. Most artifacts indicative of a rooted device are searched for by hash.

Build.prop checks. As was described in emulator detection.

su execution. Attempt to execute su, and verify whether su -c id == root

su presence. su is looked up in the following locations:

/data/local/
/data/local/bin/
/data/local/xbin/
/sbin/
/system/bin/
/system/bin/.ext/
/system/bin/failsafe/
/system/sd/xbin/
/system/usr/we-need-root/
/system/xbin/

Magisk detection through mount. Check whether mount can be executed and contains databases/su.db (indicative of Magisk) or whether /proc/mounts contains references to databases/su.db.

Read-only system partitions. Check if any system partition is mounted as read-write (when it should be read-only). The result of mount is examined for any of the following entries marked rw:

/system
/system/bin
/system/sbin
/system/xbin
/vendor/bin
/sbin
/etc

Verify installed apps in the hope of finding one whose package name hashes to the hard-coded value:

0x9E6AE9309DBE9ECFL

Unfortunately, that value was not reversed, let us know if you find which package name generates this hash – see the algorithm below:

    public static long hashstring(String str) {
        long h = 0L;
        for(int i = 0; i < str.length(); i++) {
            int c = str.charAt(i);
            h = h << 5 ^ (0xFFFFFFFFF8000000L &amp; h) >> 27 ^ ((long)c);
        }
        return h;
    }

NOTE: App enumeration, as is common with DexGuard, is performed in two ways to maximize chances of evading partial hooks.

  • Straightforward: PackageManager.getInstalledApplications
  • More convoluted: iterate over all known MAIN intents: PackageManager.queryIntentActivities(new Intent("android.intent.action.MAIN")), derive the package name from the intent via ResolveInfo.activityInfo.packageName

SElinux verification. If the file /sys/fs/selinux/policy cannot be read, the check immediately passes. If it is readable, the policy is examined and hints indicative of a rooted device are looked for by hash comparison:

472001035L
-601740789L

The hashing algorithm is extremely simple, see below. For each byte of the file, the crc is updated and compared to hard-coded values.

long h = 0L;
//for each byte:
    h = (h << 5 ^ ((long)(((char)b)))) &amp; 0x3FFFFFFFL;
    // check h against known list

Running processes checks. All running processes and their command-lines are enumerated and hashed, and specific values are indirectly looked up by comparing against hard-coded lists.

APK Check

This verifier parses compressed entries in the APK (zip) file and compares them against well-known, hard-coded CRC values.

Manifest Check

Consistency checks on the application Manifest consists of enumerating the entries using two different ways and comparing results. Discrepancies are reported.

  • Open the archive’s MANIFEST.MF file via Context.getAssets(), parse manually
  • Use JarFile(Context.getPackageCodePath()).getManifest().getEntries()

Discrepancies in the Manifest could indicate system hooks attempting to conceal files added to the application.

Permissions Check

This routine checks for permission discrepancies between what’s declared by the app and what the system grant the app.

  • Set A: App permission gathering: all permissions requested and defined by the app, as well as all permissions offered by the system, plus the INTERACT_ACROSS_USERS and INTERACT_ACROSS_USERS_FULL permissions,
  • Set B: Retrieve all permissions that exist on the system
  • Define set C = B – A
  • For every permission in C, use checkCallingOrSelfPermission (API 22-) or checkSelfPermission (API 23+) to verify that the permission is not granted.

Permission discrepancies could be used to find out system hooks or unorthodox execution environments.

Note the “X & -(A+1) | ~X & A” checks. DexGuard makes use of multiple arithmetic tricks to complicate the obfuscation flow. Here, that expression is never equals to v2, and therefore, the if-check will always fail. JEB 3.20 does not clean all those artifacts.

Miscellaneous

Other runtime components include library code to perform SSL certificate pinning, as well as obfuscated wrappers around web view clients. None of those are of particular interest.

Wrapper for android.webkit.WebViewClient. Make sure to enable deobfuscators and provide guardX hints. When this is done, most methods will be crystal clear. In fact, the majority of them are simple forwarders.

Conclusion

That’s it for DexGuard obfuscation and runtime protection facility. Key take-away to analyze DexGuard’ed code:

  • Keep the obfuscators enabled
  • Locate the opaque integers, rename them to guard0/guard1 to give JEB a hint on where control flow deobfuscation should be performed, and redecompile the class

The second part in the series will focus on bytecode encryption and how such classes can be confidently retrieved and decrypted.

  1. VM in VM, repeat ad nauseam – something not new to code protection systems, it’s existed on x86 for more than a decade, but new on Android, and other players in this field, commercial and otherwise, seem to be implementing similar solutions.
  2. So-called “RASP”, a relatively new acronym for Runtime Application Self-Protection
  3. Decryption and unreflection are generic processes of dexdec (the DEX Decompiler plugin); there is nothing specific to DexGuard here. The vast majority or encrypted data, regardless of the protection system in place, will be decrypted.

JEB Android Updates – Generic String Decryption, Lambda Recovery, Unreflecting Code, and More

Updated on March 11.

A note about 2020 Q1 updates (versions 3.10 to 3.16) regarding the DEX/Dalvik decompiler modules:

  • Generic String Decryption
  • Lambda Recovery
  • Unreflecting Code
  • Decompiling Java Bytecode
  • Auto-Rename All

Generic String Decryption

JEB ships with a generic deobfuscator that can perform on-the-fly string decryption and other complex optimizations. Although this optimizer performs safe (i.e., guaranteed) optimizations in most cases, it is unsafe in the general case case and therefore, may be disabled in the options. Refer to the Engines options .parsers.dcmp_dex.EnableDeobfuscators and .parsers.dcmp_dex.EmulationSupport.

Many code protectors such as DexGuard, Arxan, Dash-O, Allatori, etc. offer options to replace immediate string constants by method invocations that perform on-the-fly decryption.

A variety of techniques exist, ranging from simple one-off trivial decryptor methods, to complex schemes involving object(s) creation, complicated decryptors injected in third-party packages, non-trivial logic, junk code meant to slow down analyzers, use of opaque predicates, etc. They are implemented in an infinite number of ways. JEB’s generic deobfuscator can perform quick, safe emulation of the intermediate representation to provide a replacement. It may sometimes fail or bail out due to several reasons, such as performance or pitfalls like anti-emulation and anti-sandboxing techniques.

Example 1

The string decryptor is a static method reading encrypted string data in a class byte array. Also note that code reflection is used.

The above code (blue box) ends up being deobfuscated to:

Example 2:

The decryption methods were injected into library packages, e.g. Gson’s

The above code is deobfuscated to:

With the generic string decryptor optimizer ON

Below, a decryptor that had been injected into the com.google.gson.Gson() class:

The concat(String, int) method is not part of standard Gson, of course. It was injected by the protector and is used by (some) code to perform on-the-fly string decryption.

Example 3:

One last example, which was involuntarily – yet, quite timely! – provided by a user:

Static fields initialized with decrypted strings.
After decryption.

Decrypting all strings: The decryptor kicks in when decompiling methods only. At the moment, if a string happens to be successfully decrypted, the optimizer does not attempt to recover all similarly encrypted strings in the code, although it is most certainly an addition that will make it in a future software update.

Rendering: You may quickly identify decrypted strings in the client as they are rendered using a special color associated with the itemId STRING_GENERATED, by default rendered in a flashy pink color in light and dark themes. Hovering over such items will bring up a pop-up with additional origin information, like the underlying code that would have generated that string:

Auto-decrypted strings in pink; overlays with source/origin information.

API:
– From a DEX perspective: Generated strings are artificial. Therefore, IDexString.isArtificial() would return true.
– From a Java/AST perspective: IJavaConstant objects that embed origin information do so using the “origin” tag. Use IJavaConstant.getTags().get("origin") to retrieve it.

Lambda Recovery

JEB attempts to perform Java 8 style lambda recovery and reconstruction.

Desugared Lambdas

Recovery and reconstruction does not rely on any type of metadata 1, such as special prefixes -$$Lambda$ for classes and methods implementing desugared lambdas in dex 37-.

You may therefore see constructs like this:

This DEX file contains desugared, non-obfuscated lambdas.
This DEX file contained desugared, obfuscated lambdas

Options: Lambda reconstruction can be disabled in the options (Edit, Options, Engines, …). Lambda rendering can also be disabled in the options, as well as on-demand by right-clicking a decompiled view, Rendering Options….

Lambdas options

API Note: In the above cases, the underlying Java AST may be a IJavaNew or IJavaStaticField node. This is not the case for real (not desugared) lambdas, which map to an IJavaCall node – see below.

Real Lambdas

Lambda reconstruction also takes place when the code has not been desugared (which is rare!), i.e. code relying on dex38’s invoke-custom and invoke-polymorphic.

This DEX file contains real lambdas implemented via invoke-custom

API Note: Such lambdas map to an IJavaCall node for which isCustomCall() will return true.

Unreflecting Code

Many code protectors make heavy use of reflection – combined with string encryption, as we’ll see below – to obfuscate code. In practice, reflection is limited to method invocation (static and virtual), static and non-static field setting and getting, and new instance creation. A few examples:

v = Class.forName("java.lang.Integer").getMethod("valueOf",
        String.class).invoke(null, str);
// instead of 
v = Integer.valueOf(str);
Class.forName("SomeClassName").getField("b").setInt(x, 4);
// instead of 
x.b = 4;
Class.forName("java.lang.String").getConstructor(byte[].class)
        .newInstance(val);
// instead of
new String(arg6);

Such code is generally protected by a catch-all handler that forwards the cause of any exception raised by a reflection issue:

try {
    // ...
}
catch(Throwable e) {
    throw e.getCause();
}

By default, JEB will attempt to unreflect code. This deobfuscator is potentially unsafe and may be disabled in the options. Note that you always have the ability to choose, for a particular decompilation, whether some options should be temporarily enabled or disabled, by pressing CTRL+TAB (or COMMAND+TAB on macOS) to decompile (same as menu Action, Decompile with options…).

Unsafe deobfuscators can be globally disabled.

So, in a nutshell, code normally decompiled to:

Reflection, not cleaned (malware obfuscated by an older version of DexGuard (?))

will be decompiled to:

With reflection cleaned

Technical Note: This optimizer works on the Intermediate Representation manipulated by the decompiler, not to be confused with the AST rendered as its output. (The AST cleaner that was described in an older post is more limited than this IR optimizer.)

Last-step failures: Successfully unreflecting code eventually depends on being able to find the intended target method or field matching the provided description (method parameter types or field type). Failure to do so will generate a log like "A candidate field/method/constructor for unreflection was not found".

Decompiling Java Bytecode

JEB supports JLS bytecode decompilation for *.class files and jar-like archives (jar, war, ear, etc.). The Java bytecode is converted to Dalvik using Android’s dx by default. Users may choose to use d8 (not recommended for now) instead by selecting so in the Options.

The resulting DEX file(s) are processed as usual.

You may use this to decompile Android Library files (*.aar files) in JEB.

Examining the android-arch-core-runtime library

Auto-Rename All

JEB 3.13 introduced a new generic action, Auto-Rename All. Its implementation is at the discretion of code plugins. The DEX plugin implements it, therefore users may execute Action, Auto-Rename All… at any time (generally after processing an obfuscated file) in order to rename code items such as field, method, or class names, to something more easily processable for our -limited- human brains.

Look at this horrendous obfuscation scheme below. It’s using right-to-left unicode characters to seriously mess up rendering:

Obfuscated name using RTL Arabic characters

Let’s run Action, Auto-Rename All… on this file:

Auto-Renaming capabilities are provided (optionally) by plugins.
After auto-renaming code items in the above file. Not clearer in terms of meaning, but at least, it’s something we can start working on.

As usual, feel free to join us on Slack, message us on Twitter, or email us privately at support@pnfsoftware.com.

Until next time!

  1. Relying on metadata leads to false negatives in the best case – e.g., when the code has been minified by something like ProGuard; it leads to false positives in the worst case – e.g. forged metadata to incite the decompiler to generate inaccurate or wrong code.

Traveling Around Mars With C Emulation (Part 1)

In previous blog posts, we explained how JEB’s custom Intermediate Representation can serve to analyze an executable and perform advanced deobfuscation. Now it’s time to turn to the final output produced by JEB native decompilers: C code1!

In this series of blog posts, we will describe our journey toward analyzing a heavily obfuscated crackme dubbed “MarsAnalytica”, by working with JEB’s decompiled C code.

To reproduce the analysis presented in this post, make sure to update JEB to version 3.1.3+.

MarsAnalytica Challenge Reconnaissance

MarsAnalytica crackme was created by 0xTowel for NorthSec CTF 2018. The challenge was made public after the CTF with an intriguing presentation by its author:

My reverse engineering challenge ‘MarsAnalytica’ went unsolved at #nsec18 #CTF. Think you can be the first to solve it? It features heavy #obfuscation and a unique virtualization design.

0xTowel

Given that exciting presentation, I decided to use this challenge mainly as a playground to explore and push JEB’s limits (and if we happen to solve it on the road, that would be great!).

The MarsAnalytica sample analyzed in these blogs is the one available on 0xTowel’s GitHub 2. Another version seems to be available on RingZer0 website, called “MarsReloaded”.

So, let’s examine the beast! The program is a large x86-64 ELF (around 10.8 MB) which, once executed, greets the user like this:

Inserting a dummy input gives:

So I guess we have to find a correct Citizen ID! Now let’s open the executable in JEB. First, the entry point routine:

Entry Point

Ok, the classic libc entry point, now let’s look at strings and imports:

A few interesting imports: getchar() to read user input, and putchar() and puts() to write. Also, some memory manipulation routines, malloc() and memcpy(). No particular strings stand out though, not even the greeting message we previously saw. This suggests we might be missing something.

Actually, looking at the native navigation bar (right-side of the screen by default), it seems JEB analyzed very few areas of the executable:

Navigation Bar
(green is cursor’s location, grey represents area without any code or data)

To understand what happened let’s first look at JEB’s notifications window (File > Notifications):

Notifications Window

Two interesting notifications here: first the file was deemed “malformed/obfuscated”, due to its sections being stripped, and second the analysis style was initially set to CONSERVATIVE.

The CONSERVATIVE analysis style means JEB was cautious during disassembly; it only followed safe control-flow relationships (i.e. branches with known targets), and searched for common routine prologue patterns to find unreferenced routines (e.g. push rbp / mov rbp, rsp)

This likely explains why most of the executable was not analyzed: the control-flow could not be safely followed and the unreferenced code does not start with common prologue patterns.

JEB usually employs AGGRESSIVE analysis on standard Linux executables, and disassembles (almost) anything within code areas (also known as “linear sweep disassembly”). In this case, JEB went CONSERVATIVE because the ELF file looks non-standard (sections are stripped).

Explore The Code (At Assembly Level)

Let’s take a look at the actual main() (first argument of __libc_start_main()):

main() code
(part 4)

Ok… that’s where the fun begins!

So, first a few memcpy() to copy large memory areas onto the stack, followed by series of “obfuscated” computations on these data. The main() routine eventually returns on an address computed in rax register. In the end, JEB’s disassembler was not able to get this value, hence it stopped analyzing there.

Let’s open the binary in JEB debugger, and retrieve the final rax value at runtime: 0x402335. We ask JEB to create a routine at this address (“Create Procedure”, P), and end up on very similar code. After manually following the control-flow, we end up on very large routines — around 8k bytes –, with complex control-flow, built on similar obfuscated patterns.

And yet at this point we have only seen a fraction of this 10MB executable… We might naively estimate that there is more than 1000 routines like these, if the whole binary is built this way (10MB/8KB = 1250)!

It should be noted that most of the obfuscated routines re-use the same stack area (initialized in main() with the series of memcpy()).

Disassemble’em All!

To automatically discover more routines, I configured JEB into disassembling “everything” within code areas, i.e. applying an AGGRESSIVE analysis rather than a CONSERVATIVE one 3.

This did not end well, due to an anti-disassembly trick that interleave useless 0xE8 bytes within correct code.

Correct disassembly
(i.e. faithful to what will be executed)

When disassembling linearly, those bytes will be considered as code, because 0xE8 is x86 opcode for the CALL instruction. Hence the disassembly listing ends-up de-synchronized to what will actually be executed:

Incorrect disassembly
(0xE8 byte has been disassembled as CALL, and “stole” the next bytes)

We could tweak the disassembly algorithm, but at this point we don’t know if there are others hidden “gifts” . Thus, it seems that the most robust option to correctly analyze the whole executable is to find a way to follow the control-flow.

Explore The Code (At C Level)

Let’s now take a look at the pseudo-C code produced by JEB for those first routines. For example, here is main():

Decompiled main()

The code is (pretty) nice and short! Actually… too short, a lot of the original code is not present in the decompiled code. What happened? Many instructions write values into the stack, and those values will not be re-used later on in the same routine; therefore the instructions have been deemed “useless” and removed.

But those written values will likely be used by the next routines, which share the same stack, as noted earlier. So we need to keep them if we want to have a chance to correctly analyze the code.

This can be done by configuring JEB to not apply “aggressive optimizations” during decompilation 4. Here is the new main() decompiled code:

Decompiled main() without aggressive optimizations

A few lengthy expressions, but it remains pretty decent, given the complexity of the original assembly code: 60 lines of C, most of them simple assignments, to represent around 200 non-trivial assembly instructions.

We Need A Plan

Current Understanding

The executable is divided into (not so small) handler routines, each of them passing control to the next one by computing its address. For that purpose, each handler reads values from a large stack, make a series of non-trivial computations on them, then write back new values into the stack.

After following manually a bunch of the handlers it seems the user input is only processed after a lot of them have been executed.

As originally mentioned by 0xTowel, the crackme author, it looks like a virtual-machine style obfuscation, where bytecodes are read from memory, and are interpreted to guide the execution.

Also, let’s notice that while the executable is impressively obfuscated, there are some “good news”:

  • There does not seem to be any self-modifying code, meaning that all the code is statically visible, we “just” have to compute the control-flow to find it.
  • JEB decompiled C code looks (pretty) simple, most C statements are simple assignments, except for some lengthy expression always based on the same operations; the decompilation pipeline simplified away parts of the complexity of the various assembly code patterns.
  • There are very few subroutines called (more on that in the next blogs), and also few system APIs, so most of the logic is contained within the chain of obfuscated handlers, connected through jmp rax or push rax/ret instructions.

Current Objective

Our first goal would be to find where the user input starts to be processed (typically a call to getchar()), and what is the exact memory state at this point (as we are likely going to need it to solve this madness).

Here Is A Plan

Given all that, we could pass through all the “deterministic” part of the execution (i.e. until the user’s input is processed) by implementing a C emulator.

The emulator would simulate the execution of each handler routine, update a memory state, and retrieve the address of the next handler, which can be described by the following pseudo-code:

emulatorState = initEmulator() // initialize memory
handlerAddress = 0x400DA9 // first handler (known address)
while(true){
  analyze(handlerAddress) // disassemble and decompile
  emulatorState = emulate(handlerAddress, emulatorState)
  handlerAddress = emulatorState.getRAX() // RAX=next handler
}

The program will then produce an execution trace, and provide us access to the exact program’s state. Hence, we should find at some point where the user’s input is processed (typically, a call to getchar()).

The main advantage of this approach is that we are going to work on C code, rather than assembly code. This will become handy later to analyze how the user’s input is processed.

Also, there are a few reasons I decided to go down that (unusual?) road, which have very little to do with MarsAnalytica challenge:

  • The emulator would be architecture-independent — several native architectures are decompiled to C by JEB –, allowing us to re-use it in situations where we cannot easily execute the target (e.g. MIPS/ARM).
  • It will be an interesting use-case for JEB public API to manipulate C code. Users could then extend the emulator to suit their needs.
  • This approach can only work if the decompilation is correct, i.e. if the C code remains faithful to the original native code. In other words, it allows to “test” the decompilation pipeline’s correctness, which is — as a JEB’s developer — interesting!

Nevertheless, a major drawback of emulating C code on this particular executable, is that we need the C code in the first place! Decompiling 10MB of obfuscated code is going to take a while; therefore this “plan” is certainly not the best one for time-limited Capture-The-Flag competitions.

Cliffhanger Ending

How to implement a C emulator with JEB API? Is MarsAnalytica decompiled code correct? Is it really such a bad good plan to use an emulator? Will I perish under performance problems?

Answers to those questions (and more) in part 2!


  1. While JEB’s default decompiled code follows (most of) C syntactic rules and their semantics, some custom operators might be inserted to represent low-level operations and ease the reading; hence strictly speaking JEB’s decompiled code should be called pseudo-C. The decompiled output can also be variants of C, e.g. the Ethereum decompiler produce pseudo-Solidity code.
  2. SHA1 of the UPX-packed executable: fea9d1b1eb9d3f93cea6749f4a07ffb635b5a0bc
  3. Changing the analysis style can be done with .parsers.x86_64.AnalysisStyle engine option
  4. Disabling aggressive optimizations can be done with .parsers.dcmp_x86_64.IROptimizerDisableAggressivePass engine option

JEB Native Analysis Pipeline – Part 2: IR Optimizers

In part 1 of this series, we gave an overview of the Intermediate Representation used by JEB’s Native Analysis Pipeline, as well as a simple Python script demonstrating how to use the API to access and print out IR-CFG of decompiled routines.

In part 2, we continue our exploration of JEB IR. We will show how to write a custom IR optimizer plugin to clean-up a custom obfuscation used in a piece of code. The resulting decompiled C code will end up very readable as well.

Before you proceed, make sure to update JEB Pro to version 3.1.1+.

Obfuscated Crypto-stealer Code

The sample we are going to look at monitors Windows clipboards for cryptocurrency-looking wallet addresses, and replaces them with a desired target address. The sample is specifically targeting Ethereum wallet addresses. It is a neutered final stage payload – the recipient address has been scrambled to render the code ineffective.

PE characteristics of file 1.exe

Although the payload is unpacked, what is interesting is that one of its key routines is obfuscated: custom garbage code was inserted.

Junk (useless) assignments

The garbage code is easy to go through: a bit of manual analysis shows that junk instructions are assigning pseudo-random values to an array whose bytes are never used. Two types of assembly patterns are present:

1- mov dword ptr [edi + offset], junk_value ; edi previously init. to
; junkarray address
2- push junk_value
pop dword ptr [junkarray_address + offset]

If we decompile that code and look at the final IR (as shown below), we can see that those instructions ended up being converted and optimized to the following type of assignment:

Assign(Mem(mem_address), Imm(junk_value))

Currently, the decompiled code looks like the following, hard-to-digest blob:

Snippet of decompiled code (obfuscated)

Although quite painful to read, we can follow the program’s logic by abstracting away the junk assignments. (Essentially, win32 functions’ OpenClipboard, GetClipboardData, and SetClipboardData are used to retrieve, check, and replace copy-pasted Ascii and Unicode text, if they match the following pattern “/0x(..){20}/”. The replacement string target wallet address, previously decrypted by sub_401000.)

Cleaning the Intermediate Representation

Recall that the native analysis pipeline can be simplifed as the workflow below:

CodeObject (*)
-> Reconstructed Routines & Data
-> Conversion to IR (low-level, non-optimized)
-> IR Optimizations <--- this is where we'll work
-> Final IR (higher-level, optimized, typed)
-> Generation of AST
-> AST Optimizations
-> Final AST (final, cleaned)
-> High-level output (eg, C variant)

Our custom IR optimizer will look for junk assignments and remove them. The important criteria are: What is the junk array start and end addresses? Is it common to all routines in the binary, or is there one array per routine? Those questions may be hard to answer in the general case. However, for our specific sample file, we can assert with a high-degree of certainty that the junk array:
– starts at address 0x415882
– is at most 256 bytes long
– is used solely by sub_401171, the routine we want to analyze

Because of the above restrictions, the IR optimizer we are going to write should be qualified as a custom or ad-hoc IR optimizer. Chances are, we won’t be able to reuse it as-is in other programs without some amount of tweaking.

Let’s get started, we will:
– create an Eclipse project with scaffold code for a Java back-end plugin
– write and test a custom IR optimizer with a headless client
– deploy the plugin and make it usable and accessible from the UI desktop client

Creating a Plugin Project

Before we proceed, make sure to:

  • Define an environment variable JEB_HOME, that points to your JEB installation folder
  • Install Eclipse IDE

Then:

  • Clone the jeb-native-ir-optimizer-example1 repository.
  • Create an Eclipse project by running:
    • On Windows: create-eclipse-project.cmd
    • On Linux/macOS: create-eclipse-project.sh
  • Open Eclipse and import the newly-created project into your Workspace (File, Import, Existing Projects into the Workspace, select the cloned repository folder, proceed)
Importing an existing project into Eclipse

Debugging the Obfuscation

Now that your project is imported in Eclipse, you should be able to see two source files in src’s default package:

  • Tester.java
  • EOptExample1.java

EOptExample1 is the IR optimizer plugin we will be working on. (Note that several classes of plugins exist, this one is a native IR optimizer, and therefore inherits from AbstractEOptimizer or one its subclasses.)

Tester creates a headless JEB instance that loads the plugin EOptExample1.

Package Explorer view of the newly-created project

Tester.java does the following:

  • Create a JEB instance 1
  • Load the test plugin EOptExample1
  • Then, create a JEB project and load the artifact file samples/1.exe (IMPORTANT: unzip 1.zip to 1.exe first – password: password)
  • Analyze the artifact
  • Retrieve a handle on the native decompiler
  • Retrieve a handle on the to-be-analyzed routine sub_401171
  • Perform a full decompilation of that routine

Let’s have a preliminary look at EOptExample1: This IR optimizer type is set to STANDARD, which is not ideal when you use custom optimizers tailored for specific code. A better IR optimizer class for those is ON_DEMAND: those optimizers are to be manually invoked, e.g. from JEB UI (menu: File, Advanced Unit Options). However, during development, since we are focusing on a particular file and routine, STANDARD type may be fine. Standard optimizers are called during regular IR optimization phases of the decompilation pipeline.

public class EOptExample1 extends AbstractEOptimizer {

    public EOptExample1() {
        super(DataChainsUpdatePolicy.UPDATE_IF_OPTIMIZED);
        getPluginInformation().setName("Sample IR Optimizer #1");
        getPluginInformation().setDescription("Remove IR-statements reduced to \"*(&amp;garbage + delta) = xxx\"");
        getPluginInformation().setVersion(Version.create(1, 0, 0));
        // Standard optimizers are normally run, as part of the IR optimization stages in the decompilation pipeline
        setType(OptimizerType.STANDARD);
    }

    // replace all IR statements previously reduced to EMem ("[junk_address] = xxx") to ENop
    @Override
    public int perform(boolean updateDFA) {
        logger.info("IR-CFG before running custom optimizer \"%s\":\n%s", getName(),
                DecompilerUtil.formatIRCFGWithContext(2, cfg, ectx));
        // ...
        // optimizer code
    }
}

Note the plugin’s data-chains update policy, set to UPDATE_IF_OPTIMIZED. Optimizations that specify this flag tell their runner, aka the master optimizer that orchestrate them, that identifiers may be modified – hence, if optimizations occurred, a data flow analysis (DFA) pass needs to take place again. DFA update policies are a topic for another article.

Lines 3-5 are plugin metadata information, such as name and description, authorship, version numbers (including minimum/maximum JEB back-end versions), etc.

Before we deep-dive into perform(), let’s first set a breakpoint on line 15, where logger.info(…) is called. Then, start a debugging session for Tester: menu Run, command Debug (hotkey: F11.)

After a few seconds of analysis, your breakpoint should be hit; it corresponds to the first-time invocation of your custom optimizer. The logger prints out the IR-CFG that’s about to be optimized. Let’s have a look at it:

IR-CFG before running custom optimizer "Sample IR Optimizer #1":
>> IN(@0): ecx={@D} esp={@0} ebp={@1} ss={@1,@C,@18,@1D,@21,@24,@25,@27,@30,@35,@38,@3B,@3E,@3F,@41,@43,@46,@4F,@51,@54,@56,@59,@5C,@5D,@5F,@6B,@77,@81,@84,@9B,@9E,@A0,@AC,@B8,@BA,@BD,@BF,@C3,@C5,@C7,@CB,@CD,@D1,@D2,@D4,@E0,@E9,@EF,@F1,@F5,@F7,@FB,@FC,@FE,@100,@103,@106,@107,@109,@10C,@10E,@112,@114,@116,@11A,@11C,@11F,@122,@123,@12E,@131,@133,@137,@139,@13D,@13E,@140,@143,@145,@149,@14B,@14F,@150,@152,@15D,@173,@176,@179,@17C,@17D,@17F,@181,@18A,@18C,@18F,@191,@194,@196,@19A,@19D,@19E,@1A0,@1B3,@1BF,@1C2,@1D9,@1DC,@1E0,@1EC,@1EF,@1F1,@1F3,@1F7,@1F9,@1FC,@1FF,@202,@203,@205,@211,@21D,@220,@222,@226,@227,@229,@22B,@22E,@231,@232,@234,@237,@23A,@23C,@23E,@242,@244,@246,@24A,@24C,@250,@251,@25C,@25F,@262,@263,@265,@268,@26A,@26E,@271,@272,@27A,@27F,@295,@298,@29A,@29D,@2A4,@2A9,@2AD,@2B0,@2B2,@2B6,@2B7,@2BA,@2BC,@2C0,@2C2,@2C6,@2C7,@2CA,@2CD,@2D2,@2DE,@2E4,@2E7,@2E8,@2EA} ds={@F,@11,@19,@1E,@22,@28,@31,@36,@39,@3C,@40,@44,@4C,@4E,@52,@55,@57,@5A,@60,@6C,@74,@78,@7B,@82,@85,@86,@87,@8E,@90,@92,@9C,@A1,@A3,@A4,@AD,@B5,@B7,@BB,@C0,@C4,@C8,@CE,@D5,@E1,@EA,@ED,@F2,@F8,@FD,@FF,@101,@104,@10A,@10F,@113,@117,@11B,@11D,@120,@124,@12F,@134,@13A,@13F,@141,@146,@14C,@153,@15A,@15C,@163,@165,@166,@167,@170,@171,@174,@177,@17A,@17E,@180,@187,@189,@18D,@192,@197,@19B,@1A1,@1AB,@1B4,@1B7,@1B9,@1C0,@1C3,@1C4,@1C5,@1CC,@1CE,@1D0,@1DA,@1DD,@1DE,@1E1,@1E9,@1ED,@1F0,@1F4,@1FA,@1FD,@200,@206,@212,@219,@21B,@21E,@223,@228,@22A,@22C,@22F,@235,@238,@23B,@23F,@243,@247,@24D,@252,@25B,@25D,@260,@264,@266,@26B,@26F,@273,@27B,@27E,@285,@287,@288,@289,@292,@293,@296,@29B,@2A5,@2AA,@2AE,@2B3,@2B8,@2BD,@2C3,@2C8,@2CE,@2D0,@2D3,@2DF,@2E1,@2E2,@2E5,@2EB} OpenClipboard={@25} GetClipboardData={@3F,@17D} GlobalAlloc={@FC,@227} GlobalLock={@107,@232} GlobalUnlock={@13E,@263} SetClipboardData={@150,@272,@2B7,@2C7} CloseClipboard={@2CB} Sleep={@2E8} sub_401000={@D} sub_405010={@5D} sub_404F80={@D2} sub_4024E0={@123,@251} sub_404E54={@19E} sub_404E14={@203} 
0000/1>  s32:_esp = (s32:_esp - i32:00000004h)                                                                 DU: esp={@1,@2,@B}                 | UD: esp={} 
0001/1:  32<s16:_ss>[s32:_esp] = s32:_ebp                                                                      DU:                                | UD: esp={@0} ebp={} ss={} 
0002/9:  s32:_ebp = s32:_esp                                                                                   DU: ebp={@38,@41,@46,@4F,@54,@56,@84,@9E,@B8,@BD,@C5,@FE,@100,@10C,@114,@11C,@131,@140,@15D,@176,@17F,@181,@18A,@18F,@194,@1C2,@1DC,@1EF,@1F1,@1FC,@229,@22B,@237,@23C,@244,@25C,@265,@27F,@298,@29D}  | UD: esp={@0} 
000B/1:  s32:_esp = (s32:_esp - i32:0000002Ch)                                                                 DU: esp={@C,@D,@17}                | UD: esp={@0} 
000C/1:  32<s16:_ss>[s32:_esp] = i32:0040117Ch                                                                 DU:                                | UD: esp={@B} ss={} 
000D/1:  call s32:_sub_401000(s32:_ecx)->(s32:_eax){32[s32:_esp]}                                              DU: eax={}                         | UD: ecx={} esp={@B} sub_401000={} 
000E/1+  s32:_edi = i32:00415882h                                                                              DU: edi={}                         | UD: 
000F/1:  32<s16:_ds>[i32:00415944h] = i32:E2E60682h                                                            DU:                                | UD: ds={} 
0010/1:  s32:_eax = i32:00000001h                                                                              DU: eax={}                         | UD: 
0011/6:  32<s16:_ds>[i32:00415904h] = i32:7C64C0E4h                                                            DU:                                | UD: ds={} 
0017/1:  s32:_esp = (s32:_esp - i32:00000004h)                                                                 DU: esp={@18,@1A}                  | UD: esp={@B,@2EC} 
0018/1:  32<s16:_ss>[s32:_esp] = i32:E87A1612h                                                                 DU:                                | UD: esp={@17} ss={} 
0019/1:  32<s16:_ds>[i32:004158DDh] = i32:E87A1612h                                                            DU:                                | UD: ds={} 
001A/1:  s32:_esp = (s32:_esp + i32:00000004h)                                                                 DU: esp={@1C}                      | UD: esp={@17} 
001B/1:  nop                                                                                                   DU:                                | UD: 
001C/1+  s32:_esp = (s32:_esp - i32:00000004h)                                                                 DU: esp={@1D,@20}                  | UD: esp={@1A} 
001D/1:  32<s16:_ss>[s32:_esp] = i32:CCA4A4A0h                                                                 DU:                                | UD: esp={@1C} ss={} 
001E/2:  32<s16:_ds>[i32:004158CAh] = i32:CCA4A4A0h                                                            DU:                                | UD: ds={} 
0020/1:  s32:_esp = s32:_esp                                                                                   DU: esp={@21,@23}                  | UD: esp={@1C} 
0021/1:  32<s16:_ss>[s32:_esp] = i32:00000000h                                                                 DU:                                | UD: esp={@20} ss={} 
0022/1:  32<s16:_ds>[i32:00415951h] = i32:249E4228h                                                            DU:                                | UD: ds={} 
0023/1:  s32:_esp = (s32:_esp - i32:00000004h)                                                                 DU: esp={@24,@25,@26}              | UD: esp={@20} 
0024/1:  32<s16:_ss>[s32:_esp] = i32:004011CAh                                                                 DU:                                | UD: esp={@23} ss={} 
0025/1:  call s32:_OpenClipboard(32<s16:_ss>[(s32:_esp + i32:00000004h)])->(s32:_eax){32[s32:_esp]}            DU: eax={@33}                      | UD: esp={@23} ss={} OpenClipboard={} 
...
... (trimmed)
...

The above IR listing is a human-friendly representation of IR statements. The general format of this listing is:

cnt   what
1 >> IN(@EntryOffset){live_inputs}
1* offset/lengthC <insn> | DU:<def-use-chains> UD:<use-def-chains>
0+ << OUT(@ExitOffset){reaching_outputs}


- offset: IR statement offset
- length: IR statement length (generally, 1)
- C: indicates whether the instruction is
- the entry-point instruction (>)
- the first of a basic-block (+)
- any other instruction (:)
- insn: IR statement instruction (refer to Part 1 of this blog series)
- DU/UD: routine def-use and use-def chains
- IN: live input variables at the entry-point
- OUT: reaching output variables at a given exit point
We breakpoint’ed on logger.info(), and single-stepped one line. The output can be seen in the console view. It may be better (depending on how large your console buffer is) to examine the full output dumped to jeb-plugin-tester.log in your Temp folder.

The IR listing is relatively readable, although quite verbose at this early stage of optimization (roughly, the first pass in tier 1 of the analysis pipeline). The important idioms to look at here are:

Preliminary conversion of low-level junk inserts

a/ The first one is an Assign(Mem(Imm), Imm), which corresponds to optimized “mov [edi + offset], value”, where the value of edi was determined, propagated further, and the addition folded and converted to an immediate address.

b/ The second one is a partially optimized “push value / pop [address]”. Later optimizations phases will find and remove esp updates or esp-based operations, as was shown in the pseudo-code earlier. What we need to focus on here is the Assign(Mem(Imm), Imm), like the one in a/.

Those are the bits we will look for and modify: Assuming those assignments are useless, we will simply replace them by Nop statements.

Writing the Optimizer

At this point, our preliminary understanding of the obfuscation is enough to start writing the clean-up optimizer. Its code is extremely simple, for two main reasons:
– The obfuscation scheme itself is relatively trivial
– Other built-in JEB optimizers are giving us clean IR assignments to work on

Let’s look at the code of proceed():

    @Override
    public int perform(boolean updateDFA) {
        final long garbageStart = 0x415882;
        final long garbageEnd = garbageStart + 0x100;        
        int cnt = 0;
        for(int iblk = 0; iblk < cfg.size(); iblk++) {
            BasicBlock<IEStatement> b = cfg.get(iblk);
            for(int i = 0; i < b.size(); i++) {
                IEStatement stm = b.get(i);
                if(!(stm instanceof IEAssign)) {
                    continue;
                }
                IEAssign asg = (IEAssign)stm;
                if(!(asg.getLeftOperand() instanceof IEMem)) {
                    continue;
                }
                IEMem target = (IEMem)asg.getLeftOperand();
                if(!(target.getReference() instanceof IEImm)) {
                    continue;
                };
                IEImm wraddr = (IEImm)target.getReference();
                if(!wraddr.canReadAsAddress()) {
                    continue;
                }
                long addr = wraddr.getValueAsAddress();
                if(addr < garbageStart || addr >= garbageEnd) {
                    continue;
                }
                b.set(i, ectx.createNop(stm));
                cnt++;
            }
        }
        return postPerform(updateDFA, cnt);
    }

This optimizer inherits from AbstractEOptimizer. Therefore, the perform() method works on an IR-CFG. (Not all optimizers may choose to do so; it is sometimes easier to work directly on statements or expressions.)

process() goes through all statements or every basic block of the IR-CFG. Using the instanceof operator, we check that the statement is an assignment such as: Mem(address) = Imm. The address is retrieved, and we make sure that it falls within the junk array. If those checks succeed, we replace the assignment by a Nop.

And that is it. Clean and simple – although, not quite portable, since the junk array address and size are hard-coded into the code! But that is not the point of this blog, and neither is portability a first-class goal when writing optimizers for custom code.

Next up, let’s see how to use the plugin in an interactive session using the desktop client.

Building, Deploying, Interactive Use

In order to use the optimizer within the JEB desktop client, we either:

  • Register the plugin as a development plugin;
  • Or build the plugin as a Jar and drop it in JEB’s coreplugins/ folder.

Development Plugin

This is the easiest option. You may consider it as an intermediate step between prototyping with the headless client, as demonstrated above, and a full-blown, deployed Jar plugin.

Open the Options panel, Development tab, tick the option “Development Mode”, add the bin/ folder of your plugin’s project to the classpath, and add the classname of your plugin entry-point:

Setting up a development plugin in JEB UI

Press OK and restart JEB. Your plugin will be loaded and ready to use. You may now skip to the section “Using the IR optimizer plugin”.

Building a Jar plugin

The alternative is to run build.cmd (on Windows) or build.sh (on Linux/macOs), which calls an Ant script in the scripts/ folder, therefore, make sure to have Ant installed on your system first. You may also customize the plugin name and version before building.

The resulting Jar plugin file will be generated in your project’s out/ folder. Copy it to your JEB coreplugins/ folder and start the JEB client. Your plugin will be automatically loaded, along with the other plugins.

Using the IR Optimizer Plugin

If your plugin has the type STANDARD (default), then, as explained earlier, it will be invoked by the optimizations’ orchestrator automatically, at various times during the decompilation pipeline. If that’s the mode you’d like to choose, make sure that your plugin is generic enough to handle all types of input routines, else you’re in for some strange surprises if you ever forget to remove it from your coreplugins/ folder.

An alternative is to convert it to an on-demand plugin:

public EOptExample1() {
        super(DataChainsUpdatePolicy.UPDATE_IF_OPTIMIZED);
        getPluginInformation().setName("Sample IR Optimizer #1");
        getPluginInformation().setDescription("Remove IR-statements reduced to \"*(&amp;garbage + delta) = xxx\"");
        getPluginInformation().setVersion(Version.create(1, 0, 0));

        // Standard optimizers are normally run, as part of the IR optimization stages in the decompilation pipeline
        //setType(OptimizerType.STANDARD);

        // alternative (better for production / in UI use):
        setType(OptimizerType.ON_DEMAND);
        setPreferredExecutionStage(-NativeDecompilationStage.LIFTING_COMPLETED.getId());
        setPostProcessingActionFlags(PPA_OPTIMIZATION_PASS_FULL);
    }

– Line 11 makes the optimizer on-demand. Users must manually activate it, on specific code.
– Line 12 is recommended for on-demand optimizers: we specify at which point in in the pipeline the plugin should be called.
– Finally, we set some post-processing flags, specifying that a full round of standard optimizations must be performed after our custom optimizer has run: this will allow cleaning up code remnants, and optimize our IR-CFG further – something made possible after running an optimization pass like this one.

On-demand optimizer plugins show up in the File, Advanced Unit Options dialog box, that you may bring up when a decompiled routine has the focus:

List of on-demand optimizers managed by a given decompiler instance

Tick the optimizer box, press OK. The routine will be re-decompiled.

Clean Code

Regardless which method you choose, once cleaned up, the IR will allow for better downstream pipeline phases, including typing, AST generation, AST optimizations, etc.

The pseudo-C code has become quite readable:

The same decompiled method, after deobfuscation by the custom plugin.

Conclusion

That is it for part 2. We scratched the surface of IR optimizers (which themselves are a relatively small – albeit important – part of the overall decompilation pipeline 2) but it’s a good start. I strongly encourage you to experiment and ask your questions on our Slack channel. One ongoing effort right now is to bring the API documentation up to speed in terms of contents and sample code.

In part 3, we will continue exploring IR optimizers. Later on in the series, we will show how to write AST optimizers 3, how to write decompilation modules, and show how existing decompilers can be cutomized further. Stay tuned!

  1. JEB must have been previously run, at least once: EULA accepted, license key generated, etc.
  2. The decompilation pipeline is one component of the native analysis pipeline, which is one module, among tens, of the JEB back-end: the public API is worth exploring if you’re into advanced use cases.
  3. AST generation is one of the very final decompilation phases – working on the syntax tree serves different purposes than working on the IR

A new APK Resources Decoder with de-Obfuscation Capabilities

The latest JEB release ships with our all-new Android resources (ARSC) decoder, designed to reliably handle tweaked, obfuscated, and sometimes malformed resource files.

As it appears that optimizing resources for space (eg, the WeChat team has made their compressor/refactoring module publicly available,  etc.) or complexity (eg, commercial app protectors have been doing it for some time now) is becoming more and more commonplace, we hope that our users will come to appreciate this new module.

Here are the key points, followed by examples of what to expect from the new module.

ARSC Decoder Workflow

In terms of workflow, nothing changes: starting with JEB 2.3.10, the new Android Resources decoder module is enable by default.

If you ever need to switch back to the legacy module, simply open the Options, Advanced panel, filter on AndroidResourcesDecoderSelector and set the value to 1 (instead of 2).

ARSC Decoder Output

In terms of output, users should see improvements in at least three areas:

  • First, the module can deal with obfuscated resources and malformed files better, resulting in lower failure rates. Ideally, we’d like to get as close as possible to a 0-failure, so please report issues!
  • Second, flattened, renamed, or generally refactored resources are handled as well, and the original res/ folder will be reconstructed, resulting in a readable Resources sub-tree.
  • Finally, the module can generate an aapt2-like text output to cope with the limitations of AOSP’s aapt/aapt2 (eg, crashes); the output can be quite large, so currently, aapt2-like output generation is disabled by default. To enable it,  go to the Options, Advanced panel, filter on AndroidResourcesGenerateAapt2LikeOutput and set the value to true.  The output will be visible as an additional fragment of the APK unit view:

aapt2-like output on a file that failed aapt2

Additional Input (APK Frameworks)

By default, the latest Android framework (currently API 27) is dropped by JEB in [HOME_FOLDER]/.jeb-android-frameworks/1.apk.

If an app you are analyzing requires additional framework libraries, drop them as [package_id].apk in that folder, and you should be good to go.

Example 1: flattened resources in a banking app

Here’s a sample that demonstrates what the output looks like with an app found on VirusTotal. The app is called itsme and is produced by the ING bank. It may have been obfuscated by GuardSquare’s DexGuard 1, which performs extensive resources refactoring (res/ folder flattening) and trimming (renaming of files, name-less resource objects, etc.).

Have a look at the APK contents:

Protected app contents

aapt2 fails on it (resource id overlap):

error: trying to add resource 'be.bmid.itsme:attr/' with ID 0x7f010001 but resource already has ID 0x7f010000.

apktool 2.3.1 cannot reconstruct the resource tree either. Resources are moved to an unknown/ folder; on non-Linux system, resources manipulation also fail due to illegal character names.

JEB does its best to rebuild the resources tree, and renames illegally named resources as well across the Resources base, consistently:

A rebuilt resources tree, originally obfuscated by GuardSquare (?)

Example 2: tweaked xml

The second file is a version of the Xapo Bitcoin wallet app 2, also found on VirusTotal. This app does not fail aapt2, however, it does fail other tools, including apktool 2.3.1

I: Using Apktool 2.3.1 on 96cbabe2fb11c78a283348b2f759dc742f18368e0d65c5d0a15aefb4e0bdc645
I: Loading resource table...
I: Decoding AndroidManifest.xml with resources...
I: Loading resource table from file: [...]/1.apk
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 8601

The resources are flattened and renamed; the XML resources are oddly structured and stretch the XML specifications as well.

JEB handles things smoothly.

Conclusion

There are many more examples of “stretched” resources in APKs we’ve come across, however we cannot share them at the moment.

If you come across unsupported scenarios or bugs, feel free to issue a report, we’ll happily investigate and update the module.

  1. https://www.guardsquare.com/en/dexguard
  2. https://xapo.com/

Having Fun with Obfuscated Mach-O Files

Last week was the release of JEB 2.3.7 with a brand new parser for Mach-O, the executable file format of Apple’s macOS and iOS operating systems. This file format, like its cousins PE and ELF, contains a lot of technical peculiarities and implementing a reliable parser is not a trivial task.

During the journey leading to this first Mach-O release, we encountered some interesting executables. This short blog post is about one of them, which uses some Mach-O features to make reverse-engineering harder.

Recon

The executable in question belongs to a well-known adware family dubbed InstallCore, which is usually bundled with others applications to display ads to the users.

The sample we will be using in this post is the following:

57e4ce2f2f1262f442effc118993058f541cf3fd: Mach-O 64-bit x86_64 executable

Let’s first take a look at the Mach-O sections:

Figure 1 – Mach-O sections

Interestingly, there are some sections related to the Objective-C language (“__objc_…”). Roughly summarized, Objective-C was the main programming language for OS X and iOS applications prior the introduction of Swift. It adds some object-oriented features around C, and it can be difficult to analyze at first, in particular because of its way to call methods by “sending messages”.

Nevertheless, the good news is that Objective-C binaries usually come with a lot of meta-data describing methods and classes, which are used by Objective-C runtime to implement the message passing. These metadata are stored in the “__objc_…” sections previously mentioned, and the JEB Mach-O parser process them to find and properly name Objective-C methods.

After the initial analysis, JEB leaves us at the entry point of the program (the yellow line below):

Figure 2 – Entry point

Wait a minute… there is no routine here and it is not even correct x86-64 machine code!

Most of the detected routines do not look good either; first, there are a few objective-C methods with random looking names like this one:

Figure 3 – Objective-C method

Again the code makes very little sense…

Then comes around 50 native routines, whose code can also clearly not be executed “as is”, for example:

Figure 4 – Native routine

Moreover, there are no cross-references on any of these routines! Why would JEB disassembler engine – which follows a recursive algorithm combined with heuristics – even think there are routines here?!

Time for a Deep Dive

Code Versus Data

First, let’s deal with the numerous unreferenced routines containing no correct machine code. After some digging, we found that they are declared in the LC_FUNCTION_STARTS Mach-O command – “command” being Mach-O word for an entry in the file header.

This command provides a table containing function entry-points in the executable. It allows for example debuggers to know function boundaries without symbols. At first, this may seem like a blessing for program analysis tools, because distinguishing code from data in a stripped executable is usually a hard problem, to say the least. And hence JEB, like other analysis tools, uses this command to enrich its analysis.

But this gift from Mach-O comes with a drawback: nothing prevents miscreants to declare function entry points where there are none, and analysis tools will end up analyzing random data as code.

In this binary, all routines declared in LC_FUNCTION_STARTS command are actually not executable. Knowing that, we can simply remove the command from the Mach-O header (i.e. nullified the entry), and ask JEB to re-analyze the file, to ease the reading of the disassembly. We end up with a much shorter routine list:

Figure 5 – Routine list

The remaining routines are mostly Objective-C methods declared in the metadata. Once again, nothing prevents developers to forge these metadata to declare method entry points in data. For now, let’s keep those methods here and focus on a more pressing question…

Where Is the Entry Point?

The entry point value used by JEB comes from the LC_UNIXTHREAD command contained in the Mach-O header, which specifies a CPU state to load at startup. How could this program be even executable if the declared entry point is not correct machine code (see Figure 2)?

Surely, there has to be another entry point, which is executed first. There is one indeed, and it has to do with the way the Objective-C runtime initializes the classes. An Objective-C class can implement a method named “+load” — the + means this is a class method, rather than an instance method –, which will be called during the executable initialization, that is before the program main() function will be executed.

If we look back at Figure 5, we see that among the random looking method names there is one class with this famous +load method, and here is the beginning of its code:

Figure 6 – +load method

Finally, some decent looking machine code! We just found the real entry point of the binary, and now the adventure can really begin…

That’s it for today, stay tuned for more technical sweetness on JEB blog!

Defeating AppSolid Android application protector

AppSolid is a cloud-based service designed to protect Android apps against reverse-engineering. According to the SEworks’ website, AppSolid is both a vulnerability scanner as well as a protector and metrics tracker. 1 This post focuses on the protector part only.

Note: AppSolid offers various protection options. However, based on our limited testing, it turns out that protected apps can be easily un-protected. The founder of SEworks said on the ProductHunt’s description page: “We looked at Proguard and Dexguard for our own app, and source code obfuscation just wasn’t enough (plus that makes crash logs hard to read), and solutions like Arxan cost a lot of money and required weeks of our time to integrate into the app. ” While the above is essentially true, the implication is that AppSolid fares better than “source code obfuscation”.  He adds: “Why not create a shield that goes around the app to protect it, and is easy to use?” – that is the core issue that AppSolid is facing. A commenter added that “(AppSolid) does a good job of protecting against source code analysis through app decompiling”. That claim is false, but remained undisputed on the ProductHunt page.

This blog shows how to retrieve the original bytecode of a protected application. Grab the latest version of JEB (2.2.5, released today) if you’d like to try this yourself.

Bytecode Component

Once protected, the Android app is wrapped around a custom DEX and set of SO native files. The manifest is transformed as follows:

Manifest of a protected app. The red boxes indicate additions. The red line indicates removal of the LAUNCHER category from the original starting activity

  • The package name remains unchanged
  • The application entry is augmented with a name attribute; the name attribute references an android.app.Application class that is called when the app is initialized (that is, before activities’ onCreate)
  • The activity list also remain the same, with the exception of the MAIN category-filtered activity (the one triggered when a user opens the app from a launcher)
  • A couple of Appsolid activity are added, mainly the com.seworks.medusah.MainActivity, filtered as the MAIN one

Note that the app is not debuggable, but JEB handles that just fine on ARM architectures (both for the bytecode and the native code components). You will need a rooted phone though.

The app structure itself changes quite a bit. Most notably, the original DEX code is gone.

Structure of the protected app. The fake PNG file contains encrypted assets of the original app, which are handled by libmd.so.

  • An native library was inserted and is responsible for retrieving and extracting the original DEX file. It also performs various anti-debugging tricks designed to thwart debuggers (JEB is equipped to deal with those)
  • A fake PNG image file contains an encrypted copy of the original DEX file; that file will be pulled out and swapped in the app process during the unwrapping process

Snippet of high_rezolution.png – 0xDEADCODE

Upon starting the protected app, a com.seworks.medusah.app object is instantiated. The first method executed is not onCreate(), but attachBaseContext(), which is overloaded by the wrapper. There, libmd is initialized and loadDexWithFixedkeyInThread() is called to process the encrypted resources. (Other methods and classes refer to more decryption routines, but they are unused remnants in our test app. 2)

Application.attachBaseContext() override is the true entry-point of a protected app

Unwrapper thread

Calling into the native file for decryption and swapping

The rest of the “app” object are simple proxy overrides for the Application object. The overrides will call into the original application’s Application object, if there was one to begin with (which was not the case for our test app.)

Proxy stubs to the original Application’s object

The remaining components of the AppSolid DEX file are:

  • Setters and getters via reflection to retrieve system data such as package information, as well as stitch back the original app after it’s been swapped in to memory by the native component.
  • The main activity com.seworks.medusah.MainActivity, used to start the original app main activity and collect errors reported by the native component.

Native Component

The protected app shipped with 3 native libraries, compiled for ARM and ARM v7. (That means the app cannot run on systems not supporting these ABIs.) We will focus on the core decryption methods only.

As seen above, the decryption code is called via:

m = new MedusahDex().LoadDexWithFixedkeyInThread(
    getApplicationInfo(), getAssets(),
    getClassLoader(), getBaseContext(),
    getPackageName(), mHandler);

Briefly, this routine does the following:

  • Retrieve the “high_resolution.png” asset using the native Assets manager
  • Decrypt and generate paths relative to the application
    • Permission bits are modified in an attempt to prevent debuggers and other tools (such as run-as) to access the application folder in /data/data
  • Decrypt and decompress the original application’s DEX file resource
    • The encryption scheme is the well-known RC4 algorithm
    • The compression method is the lesser-known, but lightning fast LZ4
    • More about the decryption key below
  • The original DEX file is then dumped to disk, before the next stage takes place (dex2oat’ing, irrelevant in the context of this post)
  • The DEX file is eventually discarded from disk

Retrieving the decryption key statically appears to be quite difficult, as it is derived from the hash of various inputs, application-specific data bits, as well as a hard-coded string within libmd.so. It is unclear if this string is randomly inserted during the APK protection process, on the server side; verifying this would require multiple protected versions of the same app, which we do not have.

ARM breakpoint on DecryptFileWithFixedKey(), step over, and destination file in r6

A dynamic approach is better suited. Using JEB, we can simply set a breakpoint right after the decryption routine, retrieve the original DEX file from disk, and terminate the app.

The native code is fairly standard. A couple of routines have been flattened (control-flow graph flattening) using llvm-obfuscator. Nothing notable, aside from their unconventional use of an asymmetric cipher to further obscure the creation of various small strings. See below for more details, or skip to the demo video directly.

Technical note: a simple example of white-box cryptography

The md library makes use of various encryption routines. A relatively interesting custom encryption routine uses RSA in an unconventional way. Given phi(n) [abbreviated phi] and the public exponent e, the method brute-forces the private exponent d, given that:
d x e = 1 (mod phi)

phi is picked small (20) making the discovery of d easy (3).

Find d given phi and e, then decrypt using (d, n)

Decryption (p = c^d (mod n)) … with a twist

The above is a simple example of white-box cryptography, in which the decryption keys are obscured and the algorithm customized and used unconventionally. At the end of the day, none of it matters though: if the original application’s code is not protected, it – or part of it – will exist in plain text at some point during the process lifetime.

Demo

The following short video shows how to use the Dalvik and ARM debuggers to retrieve the original DEX file.

This task can be easily automated by using the JEB debuggers API. Have a look at this other blog post to get started on using the API.

Conclusion

The Jar file aj_s.jar contains the original DEX file with a couple of additions, neatly stored in a separate package, meant to monitor the app while it is running – those have not been thoroughly investigated.

Overall, while the techniques used (anti-debugging tricks, white box cryptography) can delay reverse engineering, the original bytecode could be recovered. Due to the limited scope of this post, focusing on a single simple application, we cannot definitively assert that the protector part of AppSolid is broken. However, the nature of the protector itself points to its fundamental weakness: a wrapper, even a sophisticated one, remains a wrapper.

  1. “SCAN. Scan your app for any security vulnerabilities. Get results in seconds. PROTECT. Binary protection applied in minutes. No additional coding necessary. TRACK. Monitor your app’s security status in real time. Stop hacking before it happens.” retrieved from https://appsolid.co/ on May 18
  2. Appsolid bytecode and native components could use a serious clean-up though, debugging symbols and unused code were left out at the time of this analysis.

Dexguard’s assets encryption

Continuing the post on Dexguard’s protection mechanisms, let’s have a look at another feature advertised on Saikoa’s website, assets encryption:

Encrypt assets. Hide important data as well. DexGuard can encrypt your asset files transparently, so hackers won’t just run away with them.”

Let’s have a look at an app containing a protected asset, 1.png. The hexdump below shows the encrypted 1.png. The PNG header is gone, as this is the encrypted version of the file.

1

The decryption routine is easy to spot: it appears the AssetManager.open() call was replaced by inline code that uses encrypted strings and reflection to instantiate a Cipher class and decrypt the asset file. We will use the script from the previous post to decode the strings and determine exactly what is happening. The screenshot below shows the decompiled routine in question, after it was marked up and commented using JEB (try-catch have been disabled for improved clarity, as they do not help understand how the routine works):

1

The references to the AssetManager class, open method, asset filename 1.png as well as cipher type AES-CFB have been encrypted using the string encryption feature. The decryption is achieved using standard Android classes, and the scheme used is AES with a random key and IV (outlined in red in.)

The final call to InputStream.available() is actually what the original method was doing. Therefore, the protection mechanism replaced AssetManager.open() by the following section of code:

2

We conclude that manually restoring protected assets is not difficult, although it cannot be easily automated. One way to improve that protection scheme would be to use custom decryption routines instead of SDK-provided classes, and/or generalize the use of reflection to reference the decryption code.

A look inside Dexguard

Dexguard is the commercial alter-ego of Proguard, for Android applications. While Proguard provides name obfuscation and unused code paths removal, Dexguard can also encrypt strings, classes, and assets, add tamper detection code, and use Java reflection facility to deeply obfuscate what a piece of code does.

Let’s have a quick look at one feature of Dexguard, the string encryption mechanism. The app in question is a Dexguard’ed version of Cyanide.

First, access to strings inside a protected class are replaced by calls to a static private decryption routine. That routine accesses a static private array containing all protected strings within the class. That array is created by the class initializer upon loading:

1

The decryption routine usually takes 3 arguments, although in some simpler cases, it may only take two arguments. The arguments may also be shuffled. (Note: from a Dalvik standpoint, that routine is oddly structured. Although JEB’s decompiled output is structurally fine, a for-loop construct cannot be created, mainly because of the irregular jump inside the first if-cond branch.)

The screenshot below represents that decryption routine. The name was obfuscated by Dexguard. Notice the constant numerals that have been circled in that code snippet: they are the decryption routine characteristics.

2

The following screenshot shows the same piece of code after marking it up. As you can see, the encryption algorithm is fairly simple: the current output character is derived from the previous character as well as the current byte of the encryptedStrings array. A constant (-8 in this case) is added to yield the final character. The initial length (length) as well as initial character (curChar) are also offset by hard-coded constants. These three constants (0x3E, 0x199, -8) characterize the decryption routine, and are unique for each routine. The curChar parameter acts as the decryption key.

3

You may customize this Python script to decrypt Dexguard strings:

#!/usr/bin/python
# decryptor for Dexguard encrypted strings

# customize
strings = [0x4C, 0xE, 2, 9, -7, 0x10, -54, 0x3E, 0x17, -9, -44, 0x4C, 0xA, ...]
c0, c1, c2 = (0x199, 0x3E, -8)

def decrypt(length, curChar, pos):
  length += c0
  curChar += c1
  r = ''
  for i in range(length):
    r += chr(curChar & 0xFF)
    curEncodedChar = strings[pos]
    pos += 1
    curChar = curChar + curEncodedChar + c2
  return r

Combined with reflection (another feature of the protector), the protection can be pretty effective.