Reversing DexGuard, Part 3 – Code Virtualization

Reversing DexGuard: Part 1, Part 2, Part 3

The third part of this series on DexGuard is about bytecode virtualization. The analyses that follow were done statically using JEB 3.22.

Bytecode virtualization is, in my opinion, the most interesting and technically challenging feature of DexGuard. According to GuardSquare’s website, it was introduced in version 8.3 (mid 2019).

TL;DR: JEB Pro can “unvirtualize” methods virtualized by DexGuard 8.3+.

What Is Code Virtualization

Relatively novel, code virtualization is possibly one of the most effective protection technique there is 1. With it come relatively heavy disadvantages, such as hampered speed of execution 2 and the difficulty to troubleshoot production code. The advantages are heightened reverse-engineering hurdles over other more traditional software protection techniques.

Virtualization in the context of code protection means:

  • Generating a virtual machine M1
  • Translating an original code object C0 meant to be executed on a machine M0 3, into a semantically-equivalent code object C1, to be run on M1.

While the general features of M1 are likely to be fixed (e.g., all generations of M1 are stack machines with such and such characteristics), the Instruction Set Architecture (ISA) of M1 may not necessarily be. For example, opcodes, microcodes and their implementation may vary from generation to generation. As for C1, the characteristics of a generation are only constrained by the capabilities of the converter. Needless to say, standard obfuscation techniques can be applied on C1. The virtualization process can possibly be recursive (C1 could be a VM implementing the specifications of a machine M2, executing a code object C2, emulating the original behavior of C0, etc.).

All in all, in practice, this makes M1 and C1 unique and hard to reverse-engineer.

Before and after virtualization of a code object C0 into C1

Example of a Protected Method

Note: all identifier names had been obfuscated, as is almost always the case with DexGuard. They were renamed for clarity and understanding.

Below, the class VClass was found to be “virtualized”. A virtualized class means that all non-constructor (all but <init>(*)V and <clinit>()V) methods were virtualized.

Interestingly, DexGuard does not seem to virtualize constructors

The method d(byte[])byte[] is virtualized:

  • It was converted into an interpreter loop over two large switch constructs that branch on pseudo-code entries stored in the local array pcode.
  • A PCodeVM class was added. It is a modified stack-based virtual machine (more below) that performs basic load/store operations, custom loads/stores, as well as some arithmetic, binary and logical operations.
Virtualized method. Note the pcode array. The opcode handlers are located in two switches. This picture shows the second switch, used to handle specific operations and API calls.

A snippet of the p-code VM class. Full code here, also contains the virtualized class.

The generic interpreter is called via vm.exec(opcode). Execution falls back to a second switch entry, in the virtualized method, if the operation was not handled.

Please refer to the gist linked above for a full list of “generic” VM operations. Three examples, including one showing that the operations are not as generic as the term implies:

(specific to this VM) opcode 6, used to peek the most recently pushed object
(specific to this VM) opcode 8, a push-int operation
(specific to this VM) opcode 23 is relatively specialized, it implements an add-xor stack operation (pop, pop, push). It is quite interesting to see that the protection system does not simply generate one-to-one, dalvik-to-VM opcodes. Instead, the target routine is thoroughly analyzed, most likely lifted, high-level (compounded) arithmetic operations isolated, and pseudo-generic (in PCodeVM) or specialized (in the virtualized method) opcodes generated.

As said, negative opcodes represent custom operations specific to a virtualized method, including control flow changes. An example:

opcode -25: a if(a >=b) goto LABEL operation (first, call into opcode 55 to do a GE operation on the top two integers; then, use the result to do conditional branching)

Characteristics of the DexGuard P-code VM

From the analysis of that code as well as virtualized methods found in other binaries, the characteristics of the p-code VM generated by DexGuard can be inferred:

  • The VM is a hybrid stack machine that uses 5 parallel stacks of the same height, stored in arrays of:
    • java.lang.Object (accommodating all objects, including arrays)
    • int (accommodating all small integers, including boolean and char)
    • long
    • float
    • double
  • For each one of the 5 stack types above, the VM uses two additional registers for storing and loading
  • Two stack pointers are used: one indicates the stack TOP, the other one seems to be used more liberally, and is akin to a peek register
  • The stack has a reserved area to store the virtualized method parameters (including this if the method is non-static)
  • The ISA encoding is trivial: each instruction is exactly one-word long, it is the opcode of the p-code instruction to be executed. There is no concept of register, index, or immediate value embedded into the instruction, as most stack machine ISA’s have.
  • Because the ISA is so simple, the implementation of the semantics of an instruction falls almost entirely on the p-code handler. For this reason, they were grouped into two categories:
    • Semi-generic VM operations (load/store, arithmetic, binary, tests) are handled by the VM class and have a positive id. (A VM object is used by every virtualized method in a virtualized class.)
    • Operations specific to a given virtualized method (e.g., method invocations) use negative ids and are handled within the virtualized method itself.

P-code obfuscation: junk insertion, spaghetti code

While the PCodeVM opcodes are all “useful”, many specific opcodes of a virtualized method (negative ids) achieve nothing but the execution of code semantically equivalent to NOP or GOTO.

opcodes -2, -1: essentially branching instructions. A substantial amount of those can be found, including some branching to blocks with no other input but that source (i.e., an unnecessary GOTO – =spaghetti code -, or a NOP operation if the next block is the follow.)

Rebuilding Virtualized Methods

Below, we explain the process used to rebuild a virtualized method. The CFG’s presented are IR-CFG’s (Intermediate Representations) used by the dexdec 4 pipeline. Note that unlike gendec‘s IR 5, dexdec‘s IR is not exposed publicly, but its textual representation is mostly self-explanatory.

Overall, a virtualized routine, once processed by dexdec like any other routine, looks like the following: A loop over p-code entries (stored in x8 below), processed by a() at 0xE first, or by the large routine switch.

Virtualized method, optimized, virtualized

The routine a() is PCodeVM.exec(), and its optimized IR boils down to a large single switch. 6

PCodeVM.exec()

The unvirtualizer needs to identify key items in order to get started, such as the p-code entries, identifiers used as indices into the p-code array, etc. Once they have been gathered, concolic execution of the virtualized routine becomes possible, and allows rebuilding a raw version of the original execution flow. Multiple caveats need to be taken care of, such as p-code inlining, branching, or flow termination. In its current state, the unvirtualizer disregards exceptional control flow.

Below, a raw version of the unflattened CFG. Note that all operations are stack-based; the code itself has not been modified at this point, it still consists of VM stack-based operations.

Virtualized method after unflattening, raw

dexdec’s standard IR optimization passes (dead-code removal, constant and variable propagation, folding, arithmetic simplification, flow simplifications, etc.) clean up the code substantially:

Virtualized method after unflattening and IR optimizations (opt1)

At this stage, all operations are stack-based. The high-level code generated from the above would be quite unwieldy and difficult to analyze, although substantially better than the original double-switch.

The next stage is to analyze stack-based operations to recover stack slots uses and convert them back to identifiers (which can be viewed as virtual registers; essentially, we realize the conversion of stack-based operations into register-based ones). Stack analysis can be done in a variety of ways, for example, using fixed-point analysis. Again, several caveats apply, and the need to properly identify stacks as well as their indices is crucial for this operations.

Virtualized method after unflattening, IR optimizations, VM stack analysis (opt2)

After another round of optimizations:

Virtualized method after unflattening, IR optimizations, VM stack analysis, IR optimizations (opt2_1)

Once the stack analysis is complete, we can replace stack slot accesses by identifier accesses.

Virtualized method after unflattening, IR optimizations, VM stack analysis, IR optimizations, virtual registers insertion (opt3)

After a round of optimizations:

Virtualized method after unflattening, IR optimizations, VM stack analysis, IR optimizations, virtual registers insertion, IR optimizations (opt3)

At this point, the “original” CFG is essentially reconstructed, and other advanced deobfuscation passes (e.g., emulated-based deobfuscators) can be applied.

The high-level code generation yields a clean, unvirtualized routine:

High-level code, unvirtualized, unmarked

After reversing, it appears to be a modified RC4 algorithm. Note the +3/+4 added to the key.

High-level code, unvirtualized, marked

Detecting Virtualized Methods

All versions of JEB detect methods and classes virtualized by DexGuard: run Global Analysis (GUI menu: Android) on your APK/DEX and look for those special events:

dexdec event:
“Found DexGuard Virtualized routine handler (P-Code VM, can be found in DexGuard>=8.3)”

JEB Pro version 3.22 7 ships with the unvirtualizer module.

Tips:

  • Make sure to enable the Obfuscators, and enable DexGuard Unvirtualization (enabled by default in the options).
  • The try-blocks analysis must be disabled for the class to unvirtualize. (Use MOD1+TAB to redecompile, untick “Parse Exception Blocks”).
  • After a first decompilation pass, it may be easier to identify guard0/guard1, rename, and recompile, else OP obfuscation will remain and make the code unnecessarily difficult to read. (Refer to part 1 of this series to learn about what renaming those fields to those special names means and does when DexGuard is detected.)

Conclusion

We hope you enjoyed this third installment on code (un)virtualization.

There will likely be a fourth and final chapter to this series on native code protection 8. Until next time!

  1. On a personal note, my first foray into VM-based protection dates back to 2009 with the analysis of Trojan.Clampi, a Windows malware protected with VMProtect
  2. Although one could argue that with current hardware (fast x64/ARM64 processors) and software (JIT’er and AOT compilers), that drawback may not be as relevant as it used to be.
  3. Machine here may be understood as physical machine or virtual machine
  4. dexdec is JEB’s dex decompiler engine
  5. gendec is JEB’s generic decompilation pipeline
  6. Note the similarities with CFG flattened by chenxification and similar techniques. One key difference here is that the next block may be determined using the p-code array, instead of a key variable, updated after each operation. I.e., is the FSM – controlling what the next state (= the next basic block) is – embedded in the flattened code itself, or implemented as a p-code array.
  7. JEB Android and JEB demo builds do not ship the unvirtualizer module. I initially wrote this module as a proof-of-concept not intended for release, but eventually decided to offer it to our professional users who have legitimate (non malicious) use cases, e.g. code audits and black-box assessments.
  8. That will be an appropriate segue into presenting JEB 4, which release was delayed far too long!

Reversing DexGuard, Part 2 – Assets and Code Encryption

Reversing DexGuard: Part 1, Part 2, Part 3

The second part of this series on DexGuard focuses on encryption:

  • Asset encryption
  • Class encryption
  • Full application encryption

Those analyses were done statically using JEB 3.21.

Asset Encryption

DexGuard can encrypt assets, while combining other techniques, such as class encryption (seen in several high-profile apps), and bytecode obfuscation (control-flow obfuscation, string encryption, reflected API access). With most bytecode obfuscation being automatically cleaned up, Assets are being accessed in the following way:

Purple and cyan tokens represent auto-decrypted code. The assets decryptor method was renamed to ‘dec’, it provides a FilterInputStream that transparently decrypts contents.
The DecryptorFilterStream (renamed) factory method

The DecryptorFilterStream object implements a variant of TEA (Tiny Encryption Algorithm), known for its simplicity of implementation and great performance 1.

Note the convoluted generation of Q_w, instead of hard-coding the immediate 0x9E37. Incidentally, a variant of that constant is also used by RC5 and RC6.
read() decrypts and buffers 64 bits of data at a time. The decryption loop consists of a variable number of rounds, between 5 and 16. Note that Q_w is used as a multiplier instead of an offset, as TEA/XTEA normally does.

It seems reasonable to assume that the encryption and decryption algorithms may not always be the same as this one. DexGuard making extensive use of polymorphism throughout its protection layers, it could be the case that during the protection phase, the encryption primitive is either user-selected or selected semi-randomly.

JEB can automatically emulate throughout this code and extract assets, and in fact, this is how encrypted classes, described in the next section, were extracted for analysis. However, this functionality is not present in current JEB Release builds. Since the vast majority of DexGuard uses are legitimate, we thought that shipping one-click auto-decryptors for data and code at this time was unnecessary, and would jeopardize the app security of several high-profile vendors.

Class Encryption

Class encryption, as seen in multiple recent apps as well, works as follows:

  • The class to be protected, CP, is encrypted, compressed, and stored in a file within the app folder. (The filename is random and seems to be terminated by a dot, although that could easily change.) Once decrypted, the file is a JAR containing a DEX holding CP and related classes.
  • CP is managed by a custom ClassLoader, CL.
  • CL is also encrypted, compressed, and stored in a file within the app folder. Once decrypted, the file is a JAR containing a DEX holding the custom class loader CL.
  • Within the application, code using CP (that is, any client that loads CP, invokes CP methods, or accesses CP fields) is replaced by code using CM, a class manager responsible for extracting CP and CL, and loading CL. CM offers bridge methods to the clients of CP, in order to achieve the original functionality.

The following diagram summarizes this mechanism:

DexGuard class encryption mechanism

Since applications protected with DexGuard use its extensive RASP (Runtime Application Self-Protection) facility to validate the environment they’re running on, the dynamic retrieval of CL and CP may prove difficult. In this analysis, it was retrieved statically by JEB.

Below, some client code using CM to create an encrypted-class object CP and execute a method on it. Everything is done via reflection. Items were renamed for enhanced clarity.

Encrypted class loading and virtual method invocation

CM is a heavy class, highly obfuscated. The first step in understanding it is to:

With auto-decryption and auto-unreflection enabled, the result is quite readable. A few snippets follow:

Decrypted files are deleted after loading. On older devices, loading is done with DexFile; on newer devices, it is done using InMemoryDexClassLoader.
In this case, the first encrypted JAR file (holding CL) is stored as “/e.”.
In this case, the second encrypted JAR file (holding CP and related) is stored as “/f.”.
The application held two additional couples, (“/a.”, “/b.”) and (“/c.”, “/d.”)

Once retrieved, those additional files can easily be “added” to the current DEX unit with IDexUnit.addDex() of your JEB project. Switch to the Terminal fragment, activate the Python interpreter (use py), and issue a command like:

Using Jython’s to add code to an existing DEX unit
The bnz class (CL) is a ClassLoader for the protected class (CP).

The protected class CP and other related classes, stored in “/f.” contained… anti-tampering verification code, which is part of DexGuard’s RASP facility! In other instances that were looked at, the protected classes contained: encrypted assets manager, custom code, API key maps, more RASP code, etc.

Full Application Encryption

“Full” encryption is taking class encryption to the extreme by encrypting almost all classes of an application. A custom Application object is generated, which simply overloads attachBaseContext(). On execution, the encrypted class manager will be called to decrypt and load the “original” application (all DexGuard protections still apply).

Custom application object used to provide full program encryption.

Note that activities can be encrypted as well. In the above case, the main activity is part of the encrypted jar.

Conclusion

That’s it for part 2. We focused on the encryption features of DexGuard. Both offer relatively limited protection for reverse-engineers willing to go the extra mile to retrieve original assets and bytecodes.

In Part 3, we will present what I think is the most interesting feature of DexGuard: code virtualization.

Until next time!

  1. The TEA encryption family is used by many win32 packers

Reversing DexGuard, Part 1 – Code Obfuscation & RASP

Reversing DexGuard: Part 1, Part 2, Part 3

GuardSquare’s DexGuard, the commercial offshoot of ProGuard geared toward Android app protection, has grown significantly since its introduction in 2013. Technically, what started as a “ProGuard + basic string encryption + code reflection” tool evolved into a multi-platform, complex solution including: control-flow obfuscation, complex and varied data and resources encryption, bytecode encryption, virtual environment and rooted system detection, application signature and certificate pinning enforcement, native code protection, as well as bytecode virtualization 1, and more.

GuardSquare seems to be keeping a tight lid on their product’s specs, as their website does not provide public datasheet or white papers. In this series, we present the result of reverse-engineering several recent apps protected with DexGuard.

This article presents the obfuscation techniques used by DexGuard, as well as DexGuard facility made available at runtime to protected programs 2. The analysis that follows was done statically, with JEB 3.20.

Identification

Identifying apps protected by DexGuard is relatively easy. It seems the default bytecode obfuscation settings place most classes in the o package, and some will be renamed to invalid names on a Windows system, such as con or aux. That being said, DexGuard’ed apps may not be as easily identified. Closer inspection of the code will reveal stronger hints than obfuscated names: decryption stubs, specific encrypted data, the presence of some so library files, are all tell tale signs of DexGuard, as shown below.

Running a Global Analysis

Let’s run a Global Analysis (menu Android, Global analysis…) with standard settings on the file and see what gets auto-decrypted and auto-unreflected:

Results subset of Global Analysis (redacted areas are meant to keep the analyzed program anonymous; it is a clean app, whose business logic is irrelevant to the analysis of DexGuard)

Lots of strings were decrypted, many of them specific to the app’s business logic itself, others related to DexGuard RASP – that is, library code embedded within the APK, responsible for performing app signature verification for instance, which is itself protected by DexGuard. That gives us valuable pointers into where we should be looking at if we’d like to focus on the protection code specifically.

Deobfuscating Code

The first section of this blog focuses on bytecode obfuscation and how JEB deals with it. It is mostly automated, but a final step requires manual assistance to achieve the best results.

Most obfuscated routines exhibit the following characteristics:

  • Dynamically generated strings via the use of per-class decryption routines
  • Most calls to external routines are done via reflection
  • Flow obfuscation via the use of a couple of opaque integer fields – let’s call them OPI0, OPI1. They are class fields generally initialized to 0 and 1.
  • Arithmetic operation obfuscation
  • Garbage code insertion
  • Unusual protected block structure, leading to fragmented try-blocks, unavoidable to produce semantically accurate raw code

As an example, the following class is used to perform app certificate validation in order, for instance, to prevent resigned apps from functioning. A few items were renamed for clarity; decompilation is done with disabled Deobfuscators (MOD1+TAB, untick “Enable deobfuscators”):

Take #1 (snippet) – The protected class is decompiled without deobfuscation in order to show semi-raw output (a few optimizers doing all sort of code cleanup are not categorized as deobfuscators internally, and will perform even if Deobfuscation is disabled). Note that a few items were also renamed for clarity.

In practice, such code is quite hard to comprehend on complex methods. With obfuscators enabled (the default setting), most of the above will be cleared.

See the re-decompilation of the same class, below.

  • strings are decrypted…
  • …enabling unreflection
  • most obfuscation is removed…
  • except for some control flow obfuscation that remains because JEB was unable to process OPI0/OPI1 directly (below,
Take #2 (full routine) – obfuscators enabled (default). The red blocks highlight use of opaque variables used to obfuscate control flow.

Let’s give a hint to JEB as to what OPI0/OPI1 are.

  • When analyzing DexGuard’ed apps, you can rename OPI0 and OPI1 to guard0 and guard1, respectively, to allow JEB go aggressively clean the code
  • Redecompile the class after renaming the fields
Take #3 (full routine) – with explicit guard0/guard1

That final output is clean and readable.

Other obfuscation techniques not exposed in this short routine above are arithmetic obfuscation and other operation complexification techniques. JEB will seamlessly deal with many of them. Example:

is optimized to

To summarize bytecode obfuscation as of DexGuard 8.4:

  • decryption and unreflection is done automatically 3
  • garbage clean-up, code clean-up is also generic and done automatically
  • control flow deobfuscation needs a bit of guidance to operate (guard0/guard1 renaming)

Runtime Verification

RASP library routines are used at the developers’ discretion. They consist of a set of classes that the application code can call at any time, to perform tasks such as:

  • App signing verification
  • Debuggability/debugger detection
  • Emulator detection
  • Root detection
  • Instrumentation toolkits detection
  • Certificate pinning
  • Manifest check
  • Permission checks

The client decides when and where to use them as well as what action should be taken on the results. The code itself is protected by DexGuard, that goes without saying.

App Signing Verification

  • Certificate verification uses the PackageManager to retrieve app’s signatures: PackageManager.getPackageInfo(packageName, GET_SIGNATURES).signatures
  • The signatures are hashed and compared to caller-provided values in an IntBuffer or LongBuffer.

Debug Detection

Debuggability check

The following checks must pass:

  • assert that Context.ctx.getApplicationInfo().flags & ApplicationInfo.FLAG_DEBUGGABLE is false
  • check the ro.debuggable property, in two ways to ensure consistency
    • using android.os.SystemProperties.get() (private API)
    • using the getprop‘s binary
  • verify that no hooking framework is detected (see specific section below)

Debugging session check

The following checks must pass:

  • assert that android.os.Debug.isDebuggerConnected() is false
  • verify no tracer process: tracerpid entry in /proc/<pid>/status must be <= 0
  • verify that no hooking framework is detected (see specific section below)

Debug key signing

  • enumerate the app’s signatures via PackageInfo.signatures
  • use getSubjectX500Principal() to verify that no certificate has a subject distinguished name (DN) equals to "CN=Android Debug,O=Android,C=US", which is the standard DN for debug certificates generated by the SDK tools

Emulator Detection

Emulator detection is done by checking any of the below.

1) All properties defined in system/build.prop are retrieved, hashed, and matched against a small set of hard-coded hashes:

86701cb958c69d64cd59322dfebacede -> property ???
19385aafbb452f39b5079513f668bbeb -> property ???
24ad686ec83d904347c5a916acbe1779 -> property ???
b8c8255febc6c46a3e43b369225ded3e -> property ???
d76386ddf2c96a9a92fc4bc8f829173c -> property ???
15fed45d5ca405da4e6aa9805daf2fbf -> property ??? (unused)

Unfortunately, we were not able to reverse those hashes back to known property strings – however, it was tried only on AOSP emulator images. If anybody wants to help and run the below on other build.prop files, feel free to let us know what property strings those hashes match to. Here is the hash verification source, to be run be on build.prop files.

2) The following file is readable:

/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq

3) Verify if any of those qemu, genymotion and bluestacks emulator files exist and are readable:

/dev/qemu_pipe
/dev/socket/baseband_genyd
/dev/socket/genyd
/dev/socket/qemud
/sys/qemu_trace
/system/lib/libc_malloc_debug_qemu.so
/dev/bst_gps
/dev/bst_time
/dev/socket/bstfolderd
/system/lib/libbstfolder_jni.so

4) Check for the presence of wired network interfaces: (via NetworkInterface.getNetworkInterfaces)

eth0
eth1

5) If the app has the permission READ_PHONE_STATE, telephony information is verified, an emulator is detected if any of the below matches (standard emulator image settings):

- "getLine1Number": "15555215554", "15555215556", "15555215558", "15555215560", "15555215562", "15555215564", "15555215566", "15555215568", "15555215570", "15555215572", "15555215574", "15555215576", "15555215578", "15555215580", "15555215582", "15555215584"
- "getNetworkOperatorName": "android"
- "getSimSerialNumber": "89014103211118510720"
- "getSubscriberId": "310260000000000"
- "getDeviceId": "000000000000000", "e21833235b6eef10", "012345678912345"

6) /proc checks:

/proc/ioports: entry "0ff :" (unknown port, likely used by some emulators)
/proc/self/maps: entry "gralloc.goldfish.so" (GF: older emulator kernel name)

7) Property checks (done in multiple ways with a consistency checks, as explained earlier), failed if any entry is found and start with one of the provided values:

- "ro.product.manufacturer": "Genymotion", "unknown", "chromium"
- "ro.product.device": "vbox86p", "generic", "generic_x86", "generic_x86_64"
- "ro.product.model": "sdk", "emulator", "App Runtime for Chrome", "Android SDK built for x86", "Android SDK built for x86_64"
- "ro.hardware": "goldfish", "vbox86", "ranchu"
- "ro.product.brand": "generic", "chromium"
- "ro.kernel.qemu": "1"
- "ro.secure": "0"
- "ro.build.product": "sdk", "vbox86p", "full_x86", "generic_x86", "generic_x86_64"
- "ro.build.fingerprint": "generic/sdk/generic", "generic_x86/sdk_x86/generic_x86", "generic/google_sdk/generic", "generic/vbox86p/vbox86p", "google/sdk_gphone_x86/generic_x86"
- "ro.bootloader": "unknown"
- "ro.bootimage.build.fingerprint": "Android-x86"
- "ro.build.display.id": "test-"
- "init.svc.qemu-props" (any value)
- "qemu.hw.mainkeys" (any value)
- "qemu.sf.fake_camera" (any value)
- "qemu.sf.lcd_density" (any value)
- "ro.kernel.android.qemud" (any value)

Hooking Systems Detection

The term covers a wide range of techniques designed to intercept regular control flow in order to examine and/or modify execution.

1) Xposed instrumentation framework detection, by attempting to load any of the classes:

de.robv.android.xposed.XposedBridge
de.robv.android.xposed.XC_MethodHook

Class loading is done in different ways in an attempt to circumvent hooking itself, using Class.forName with a variety of class loaders, custom class loaders and ClassLoader.getLoadedClass, as well as lower-level private methods, such as Class.classForName.

2) Cydia Substrate instrumentation framework detection.

3) ADBI (Android Dynamic Binary Instrumentation) detection

4) Stack frame verification: an exception is generated in order to retrieve a stack frame. The callers are hashed and compared to an expected hard-coded value.

5) Native code checks. This will be detailed in another blog, if time allows.

Root Detection

While root detection overlaps with most of the above, it is still another layer of security a determined attacker would have to jump over (or walk around) in order to get protected apps to run on unusual systems. Checks are plenty, and as is the case for all the code described here, heavily obfuscated. If you are analyzing such files, keeping the Deobfuscators enabled and providing guard0/guard1 hints is key to a smooth analysis.

Static initializer of the principal root detection class. Most artifacts indicative of a rooted device are searched for by hash.

Build.prop checks. As was described in emulator detection.

su execution. Attempt to execute su, and verify whether su -c id == root

su presence. su is looked up in the following locations:

/data/local/
/data/local/bin/
/data/local/xbin/
/sbin/
/system/bin/
/system/bin/.ext/
/system/bin/failsafe/
/system/sd/xbin/
/system/usr/we-need-root/
/system/xbin/

Magisk detection through mount. Check whether mount can be executed and contains databases/su.db (indicative of Magisk) or whether /proc/mounts contains references to databases/su.db.

Read-only system partitions. Check if any system partition is mounted as read-write (when it should be read-only). The result of mount is examined for any of the following entries marked rw:

/system
/system/bin
/system/sbin
/system/xbin
/vendor/bin
/sbin
/etc

Verify installed apps in the hope of finding one whose package name hashes to the hard-coded value:

0x9E6AE9309DBE9ECFL

Unfortunately, that value was not reversed, let us know if you find which package name generates this hash – see the algorithm below:

    public static long hashstring(String str) {
        long h = 0L;
        for(int i = 0; i < str.length(); i++) {
            int c = str.charAt(i);
            h = h << 5 ^ (0xFFFFFFFFF8000000L &amp; h) >> 27 ^ ((long)c);
        }
        return h;
    }

NOTE: App enumeration, as is common with DexGuard, is performed in two ways to maximize chances of evading partial hooks.

  • Straightforward: PackageManager.getInstalledApplications
  • More convoluted: iterate over all known MAIN intents: PackageManager.queryIntentActivities(new Intent("android.intent.action.MAIN")), derive the package name from the intent via ResolveInfo.activityInfo.packageName

SElinux verification. If the file /sys/fs/selinux/policy cannot be read, the check immediately passes. If it is readable, the policy is examined and hints indicative of a rooted device are looked for by hash comparison:

472001035L
-601740789L

The hashing algorithm is extremely simple, see below. For each byte of the file, the crc is updated and compared to hard-coded values.

long h = 0L;
//for each byte:
    h = (h << 5 ^ ((long)(((char)b)))) &amp; 0x3FFFFFFFL;
    // check h against known list

Running processes checks. All running processes and their command-lines are enumerated and hashed, and specific values are indirectly looked up by comparing against hard-coded lists.

APK Check

This verifier parses compressed entries in the APK (zip) file and compares them against well-known, hard-coded CRC values.

Manifest Check

Consistency checks on the application Manifest consists of enumerating the entries using two different ways and comparing results. Discrepancies are reported.

  • Open the archive’s MANIFEST.MF file via Context.getAssets(), parse manually
  • Use JarFile(Context.getPackageCodePath()).getManifest().getEntries()

Discrepancies in the Manifest could indicate system hooks attempting to conceal files added to the application.

Permissions Check

This routine checks for permission discrepancies between what’s declared by the app and what the system grant the app.

  • Set A: App permission gathering: all permissions requested and defined by the app, as well as all permissions offered by the system, plus the INTERACT_ACROSS_USERS and INTERACT_ACROSS_USERS_FULL permissions,
  • Set B: Retrieve all permissions that exist on the system
  • Define set C = B – A
  • For every permission in C, use checkCallingOrSelfPermission (API 22-) or checkSelfPermission (API 23+) to verify that the permission is not granted.

Permission discrepancies could be used to find out system hooks or unorthodox execution environments.

Note the “X & -(A+1) | ~X & A” checks. DexGuard makes use of multiple arithmetic tricks to complicate the obfuscation flow. Here, that expression is never equals to v2, and therefore, the if-check will always fail. JEB 3.20 does not clean all those artifacts.

Miscellaneous

Other runtime components include library code to perform SSL certificate pinning, as well as obfuscated wrappers around web view clients. None of those are of particular interest.

Wrapper for android.webkit.WebViewClient. Make sure to enable deobfuscators and provide guardX hints. When this is done, most methods will be crystal clear. In fact, the majority of them are simple forwarders.

Conclusion

That’s it for DexGuard obfuscation and runtime protection facility. Key take-away to analyze DexGuard’ed code:

  • Keep the obfuscators enabled
  • Locate the opaque integers, rename them to guard0/guard1 to give JEB a hint on where control flow deobfuscation should be performed, and redecompile the class

The second part in the series presents bytecode encryption and assets encryption.

  1. VM in VM, repeat ad nauseam – something not new to code protection systems, it’s existed on x86 for more than a decade, but new on Android, and other players in this field, commercial and otherwise, seem to be implementing similar solutions.
  2. So-called “RASP”, a relatively new acronym for Runtime Application Self-Protection
  3. Decryption and unreflection are generic processes of dexdec (the DEX Decompiler plugin); there is nothing specific to DexGuard here. The vast majority or encrypted data, regardless of the protection system in place, will be decrypted.

JEB Android Updates – Generic String Decryption, Lambda Recovery, Unreflecting Code, and More

Updated on March 11.

A note about 2020 Q1 updates (versions 3.10 to 3.16) regarding the DEX/Dalvik decompiler modules:

  • Generic String Decryption
  • Lambda Recovery
  • Unreflecting Code
  • Decompiling Java Bytecode
  • Auto-Rename All

Generic String Decryption

JEB ships with a generic deobfuscator that can perform on-the-fly string decryption and other complex optimizations. Although this optimizer performs safe (i.e., guaranteed) optimizations in most cases, it is unsafe in the general case case and therefore, may be disabled in the options. Refer to the Engines options .parsers.dcmp_dex.EnableDeobfuscators and .parsers.dcmp_dex.EmulationSupport.

Many code protectors such as DexGuard, Arxan, Dash-O, Allatori, etc. offer options to replace immediate string constants by method invocations that perform on-the-fly decryption.

A variety of techniques exist, ranging from simple one-off trivial decryptor methods, to complex schemes involving object(s) creation, complicated decryptors injected in third-party packages, non-trivial logic, junk code meant to slow down analyzers, use of opaque predicates, etc. They are implemented in an infinite number of ways. JEB’s generic deobfuscator can perform quick, safe emulation of the intermediate representation to provide a replacement. It may sometimes fail or bail out due to several reasons, such as performance or pitfalls like anti-emulation and anti-sandboxing techniques.

Example 1

The string decryptor is a static method reading encrypted string data in a class byte array. Also note that code reflection is used.

The above code (blue box) ends up being deobfuscated to:

Example 2:

The decryption methods were injected into library packages, e.g. Gson’s

The above code is deobfuscated to:

With the generic string decryptor optimizer ON

Below, a decryptor that had been injected into the com.google.gson.Gson() class:

The concat(String, int) method is not part of standard Gson, of course. It was injected by the protector and is used by (some) code to perform on-the-fly string decryption.

Example 3:

One last example, which was involuntarily – yet, quite timely! – provided by a user:

Static fields initialized with decrypted strings.
After decryption.

Decrypting all strings: The decryptor kicks in when decompiling methods only. At the moment, if a string happens to be successfully decrypted, the optimizer does not attempt to recover all similarly encrypted strings in the code, although it is most certainly an addition that will make it in a future software update.

Rendering: You may quickly identify decrypted strings in the client as they are rendered using a special color associated with the itemId STRING_GENERATED, by default rendered in a flashy pink color in light and dark themes. Hovering over such items will bring up a pop-up with additional origin information, like the underlying code that would have generated that string:

Auto-decrypted strings in pink; overlays with source/origin information.

API:
– From a DEX perspective: Generated strings are artificial. Therefore, IDexString.isArtificial() would return true.
– From a Java/AST perspective: IJavaConstant objects that embed origin information do so using the “origin” tag. Use IJavaConstant.getTags().get("origin") to retrieve it.

Lambda Recovery

JEB attempts to perform Java 8 style lambda recovery and reconstruction.

Desugared Lambdas

Recovery and reconstruction does not rely on any type of metadata 1, such as special prefixes -$$Lambda$ for classes and methods implementing desugared lambdas in dex 37-.

You may therefore see constructs like this:

This DEX file contains desugared, non-obfuscated lambdas.
This DEX file contained desugared, obfuscated lambdas

Options: Lambda reconstruction can be disabled in the options (Edit, Options, Engines, …). Lambda rendering can also be disabled in the options, as well as on-demand by right-clicking a decompiled view, Rendering Options….

Lambdas options

API Note: In the above cases, the underlying Java AST may be a IJavaNew or IJavaStaticField node. This is not the case for real (not desugared) lambdas, which map to an IJavaCall node – see below.

Real Lambdas

Lambda reconstruction also takes place when the code has not been desugared (which is rare!), i.e. code relying on dex38’s invoke-custom and invoke-polymorphic.

This DEX file contains real lambdas implemented via invoke-custom

API Note: Such lambdas map to an IJavaCall node for which isCustomCall() will return true.

Unreflecting Code

Many code protectors make heavy use of reflection – combined with string encryption, as we’ll see below – to obfuscate code. In practice, reflection is limited to method invocation (static and virtual), static and non-static field setting and getting, and new instance creation. A few examples:

v = Class.forName("java.lang.Integer").getMethod("valueOf",
        String.class).invoke(null, str);
// instead of 
v = Integer.valueOf(str);
Class.forName("SomeClassName").getField("b").setInt(x, 4);
// instead of 
x.b = 4;
Class.forName("java.lang.String").getConstructor(byte[].class)
        .newInstance(val);
// instead of
new String(arg6);

Such code is generally protected by a catch-all handler that forwards the cause of any exception raised by a reflection issue:

try {
    // ...
}
catch(Throwable e) {
    throw e.getCause();
}

By default, JEB will attempt to unreflect code. This deobfuscator is potentially unsafe and may be disabled in the options. Note that you always have the ability to choose, for a particular decompilation, whether some options should be temporarily enabled or disabled, by pressing CTRL+TAB (or COMMAND+TAB on macOS) to decompile (same as menu Action, Decompile with options…).

Unsafe deobfuscators can be globally disabled.

So, in a nutshell, code normally decompiled to:

Reflection, not cleaned (malware obfuscated by an older version of DexGuard (?))

will be decompiled to:

With reflection cleaned

Technical Note: This optimizer works on the Intermediate Representation manipulated by the decompiler, not to be confused with the AST rendered as its output. (The AST cleaner that was described in an older post is more limited than this IR optimizer.)

Last-step failures: Successfully unreflecting code eventually depends on being able to find the intended target method or field matching the provided description (method parameter types or field type). Failure to do so will generate a log like "A candidate field/method/constructor for unreflection was not found".

Decompiling Java Bytecode

JEB supports JLS bytecode decompilation for *.class files and jar-like archives (jar, war, ear, etc.). The Java bytecode is converted to Dalvik using Android’s dx by default. Users may choose to use d8 (not recommended for now) instead by selecting so in the Options.

The resulting DEX file(s) are processed as usual.

You may use this to decompile Android Library files (*.aar files) in JEB.

Examining the android-arch-core-runtime library

Auto-Rename All

JEB 3.13 introduced a new generic action, Auto-Rename All. Its implementation is at the discretion of code plugins. The DEX plugin implements it, therefore users may execute Action, Auto-Rename All… at any time (generally after processing an obfuscated file) in order to rename code items such as field, method, or class names, to something more easily processable for our -limited- human brains.

Look at this horrendous obfuscation scheme below. It’s using right-to-left unicode characters to seriously mess up rendering:

Obfuscated name using RTL Arabic characters

Let’s run Action, Auto-Rename All… on this file:

Auto-Renaming capabilities are provided (optionally) by plugins.
After auto-renaming code items in the above file. Not clearer in terms of meaning, but at least, it’s something we can start working on.

As usual, feel free to join us on Slack, message us on Twitter, or email us privately at support@pnfsoftware.com.

Until next time!

  1. Relying on metadata leads to false negatives in the best case – e.g., when the code has been minified by something like ProGuard; it leads to false positives in the worst case – e.g. forged metadata to incite the decompiler to generate inaccurate or wrong code.

Android metaresources.arsc

One of our users recently reported an Android resources.arsc file seemingly unprocessed by JEB. Upon closer inspection, it turned out this file was not a regular binary resources file, but instead, a compressed resources container serving as a generator for localized resources.arsc. Older versions of Google Play (eg, com.android.vending 11.6.18) and other official Google applications have been using this type of file, which is stored as a raw asset and sometimes named metaresources.arsc.

I decided to have a quick look. However, for better or worse, what was planned as a superficial exploration turned into a deep-dive into the rabbit-hole that was the “meta-arsc” parsing code.

Those files, as said above, are used to generate localized (non-English) resources.arsc files. That means that the client application can generate lightweight resources files on the fly. And presumably, APKs as well. Since this mechanism seems to be primarily used by the Play Store app, a reasonable use case could be Dynamic Delivery.

  • Full support was added into JEB (3.5-Beta)
  • A brief description of the file format can be found below
  • The fully annotated JDB2 is here (as well as the source apk) if you’d like to write your own implementation of a parser and localized arsc generator. The parser and generator have been thoroughly deobfuscated and commented out where need be. Package: com.google.d.a.a.a.a. Client code: FilteredResourceHelper.
    Drop both files (jdb2, apk) in a folder and open the JDB2 file in JEB

What does it look like when metaresources.arsc is processed in JEB?

JEB arsc_meta plugin, here seen processing a metaresources.arsc file. All localized resources.arsc files are generated and attached as children of the original meta file.
A french localized resources generated from a metaresources container. JEB processes those files as regular, stand-alone arsc files, and provides textual output similar to the one generated by the aapt2-dump tool from the Android SDK.

Binary Format

Disclaimer: Specification is a work-in-progress. Refer to the JDB2 annotation and code to fill in the gaps.

metaresources.arsc=
BE_UINT32                     cnt           count of languages
BE_UINT16[cnt]                langs         2-char language codes
MetaEntry[cnt]                metaentries   meta entries matching the language codes
CompressedResourceTableChunk  restab        a compressed resource table (code: 0x1002)
EOF                           -             not necessarily the EOF, but all metares
files examined contained a single resource chunk, which is a compressed resource tab MetaEntry= BE_UINT32 magic the value 'META' BE_UINT32 entrysize complete entry size (including the above
magic) in bytes BE_UINT16 lang language code VAR_INT32 cnt1 . VAR_INT32[cnt1] offsets1 a custom serialization of java.util.BitSet
(refer to JDB2 for details) holding
positions for strings and string styles
stored in string pool chunks VAR_INT32 cnt2 . VAR_INT32[cnt2] offsets2 offsets to Table Package chunk entries
(types, typespecs) => Compressed entries: - 5 types exist, basically non-XML chunk types - Their type code is the same as arsc's with the 0x1000 bit set - List of chunks: StringPool= refer to JDB2, class CompressedStringPoolChunk ResourceTable= refer to JDB2, class CompressedResourceTableChunk Package= refer to JDB2, class CompressedPackageChunk Type= refer to JDB2, class CompressedTypeChunk TypeSpec= refer to JDB2, class CompressedTypeSpecChunk

New version of Androsig

This post is a follow-up on a previous article: we have updated the Androsig plugin and the pre-generated set of library signatures.

Reminder: Androsig is a JEB plugin used to sign and match library code for Android applications.

The purpose of the plugin is to help deobfuscate lightly-obfuscated applications that perform name mangling and hierarchy flattening (such as Proguard and other common Java and Dalvik protectors). Using our collection of signatures for common libraries, library code can be recognized; methods and classes can be renamed; package hierarchies can be rebuilt

Examples

Below, an example of what that looks like on a test app:

Matched libraries on a sample app bundling the Android Support package

Another example: running Androsig on a large app (Vidmate 4.0809), see the reconstructed glide/… sub-packages below:

Matched libraries on a PlayStore app

Installation

1) Download the latest version of the compiled binary plugin and drop it into the JEB coreplugins/ folder. If you are running JEB 3.4+, the plugin should come bundled with your .

Link: JebAndroidSigPlugin-1.1.x.jar

This single JAR offers two plugin entry-points, as can be seen in the picture below:

2) Then download and extract the latest signatures package to your [JEB]/coreplugins/android_sigs/ folder.

Link: androsig_1.1_db_20190515.zip

The user interface was unchanged so you can refer to previous article for matching, generating, results and parameters.

Debugging Android apps on Android Pie and above

Update (March 2020): It looks like the problem highlighted below regarding the impossibility to read locals that do not have associated DebugLocalInfo on Android P and Q was fixed in Android R (verified with the Developer Preview 1 released on Feb 19). The patch solving this issue is likely this one.

Lower-level components of the Dalvik debugging stack, namely JDWP, JVM TI, and JVM DI implementations, were upgraded in Android Pie. It is something we indirectly noticed after installing P.beta-1 in the Spring of 2018. For lack of time, and because our recommendation is to debug apps (non-debuggable and debuggable alike) using API levels 21 (Lollipop) to 27 (Oreo)1, reversers could easily avoid road blocks which manifested in JEB as the following:

  • An empty local variable panel (with the exception of this for non-static methods)
  • Type 35 JDWP errors reported in the console, indicating that an invalid slot was being accessed

Since JEB 3.2 is out, I decided to revisit that error before jumping into anything else2

A type 35 error in this context means an invalid local slot is being accessed. In the example shown above, it would mean accessing a slot outside of [0, 10] (per the .registers directive) since the method declares a frame of 11 registers.

The second type of noticeable errors (not visible in the screenshot) were mix-ups between variable indices. Normally, and up to the JDWP implementation used in Android Pie, indices used to access slots were Java-style parameter indices (represented in Dalvik as pX), instead of Dalvik-style indices (vX). Converting from one to the other is trivial assuming the method staticity and prototype is known. It is a matter of generating pX so that they end up at the bottom of the frame. In the case above:

v0    p2
v1    p3
v2    p4
..
v9
v10   p0
v11   p1

When issuing a JDWP request (16,1) to read frame slots, we would normally use pX indices. It is no longer the case with Android P and Q: vX indices are to be used.

Open up JEB, start debugging any APK, switch to the Terminal view, and type ‘info’. In the context of JDWP, this JEB Debugger command issues Info and Sizes requests. Notice the differences:

=> On Android Oreo (API 27):

VM> info
Debuggee is running on ?
VM information: JDWP:"Android Runtime 2.1.0" v1.6 (VM:Dalvik v1.6.0)
VM identifier sizes: f=8,m=8,o=8,rt=8,fr=8

=> On Android Pie (API 28) and Android Q:

VM> info
Debuggee is running on ?
VM information: JDWP:"Java Debug Wire Protocol (Reference Implementation) version 1.8
JVM Debug Interface version 1.2
JVM version 0 (Dalvik, )" v1.8 (VM:Dalvik v0)
VM identifier sizes: f=4,m=4,o=8,rt=8,fr=8

Notice the reported version 1.2 for JVM DI, previously unspecified, and reported version 1.8 for JDWP, likely the cause of the breakage. Also note ID encoding size updates. JDWP had been reported a 1.6 version number, as well as field and method IDs encoded on 8 bytes, for as long as I can remember.

The vX/pX index issue was easily solved. It took a little while to crack the second issue. A superficial browsing of AOSP did not show anything fruitful, but after digging around, it seemed clear that this updated implementation of JDWP used CodeItem variables’ debug information to determine which variables are worth checking, and using what type.

In JEB, right-click and select Rendering Options, tick Show Debug Directives to display variable definition and re-definition information. In the example above, the APK holds information stating that v0 is being using as a boolean starting at address 2, and v1 a String starting at address 4. Android P+’s JDWP implementation does use this information to validate local variables accesses.

See below: at address 2, v0 has been declared and is rendered. v1 has not been declared yet, the debugger cannot read it (we’ll get error 35).

Single-step: at address 4, v1 is declared. Although it is uninitialized, the debugger can successfully read the var:

So – Up until P, this metadata information, when present (almost all Release-type builds of legitimate and malware files alike discard it), had been considered indicative. Now, the debugger takes it literally. There are multiple candidate reasons as of why, but an obvious one is Safety. JDWP has been known to have the potential to crash the VM when receiving reading requests for frame variables using a bad type. E.g., requesting to read an integer-holding slot as a reference would most likely crash the target VM. Using type information providing in metadata, a debugger server can now ensure that a debugger requesting to read a slot as type T is indeed a valid request – assuming the metadata is legitimate, and since the primary use case is to debug applications inside IDE, which hold source information used to generate valid debug metadata, the assumption is fair.

Validating access to local vars has the interesting side-effect to act as an anti-debugging feature. While debugging the app remains possible, not being able to easily read some locals (parameters pX are always readable though), can be quite an annoyance.

In the future, how could we work around the JDWP limitation? Well, aside from the obvious cop out “use Oreo or below”, an idea would be to extend JEB’s –makeapkdebug option (that generates a debuggable version of a non-debuggable APK) to insert DEX metadata information specifying that all variables of a frame are used and of a given type. That may not work depending on the type of validation performed by the DEX verifier, but it’s something worth exploring. Maybe more simply, an alternative could be a custom AOSP build that disabled that feature.3 Or better yet, finding if a system property exists to disable/enable that JDWP functionality.

A final note: debugging non-debuggable APKs on Android Pie or above also proved more difficult, if not practically impossible, than on Oreo and below. Assuming your phone is rooted, here’s a solution (found when browsing around AOSP commits). On a rooted phone:

> adb root
> adb shell setprop dalvik.vm.dex2oat-flags --debuggable
> adb shell stop
> adb shell start

Until next time – Thanks!

  1. Still a valid recommendation.
  2. Native decompiler upgrades are around the corner!
  3. If a charitable soul could point us to where in AOSP that would be, we’d greatly appreciate it? ūüôā

Android Updates in JEB 3.2

As the latest update makes its way to all users (changelog), it is a good time to quickly recap additions related to Android analysis that made it into JEB versions 3.1.4, 3.1.5, and 3.2.

Dalvik Decompiler Updates

The newest releases of JEB contain several improvements to the Dalvik decompiler. I will highlight only a couple that users may find interesting. 1

Enumerations

Compiled Java enumerations can be complicated beasts. JEB attempts to re-sugar them to the best of its ability. On failure, regular classes extending java.lang.Enum will be rendered.

Obfuscation sometimes destroy important synthetic fields and structures that allow recovery heuristics to work. However, support should function reasonably well, even on enumeration data that was intentionally shuffled to generate decompilation errors. Moreover, and to keep with the spirit of interactivity in JEB, enumerated fields can be renamed – and it is done consistently over the code base, including over reconstructed switches making use of such enums.

Decompiled enums in android.arch.lifecycle. Renaming and cross-referencing enumerated constants is supported.

Custom enumerated constants are also properly reconstructed, including:

  • Field annotations
  • Custom initializers (see below)
  • Additional methods and method overrides
In this complex enumeration, the red block shows a custom initializer. Other interesting bits are the use of overrides and custom methods, annotations, as well as default and non-default constructors.

Switches

Support was recently added for switch-on-enum and switch-on-string (partial support for the latter, to be continued in the next software update).

This successfully reconstructed switch-on-string is implemented as a double-switch idiom by dx (a sparse switch on hashCode/equals to generate custom indices i, followed by a packed-switch on i). Not all switches are implemented like this. Regular if-conditional trees may be strategically generated by optimizing compilers.

Inner classes, Anonymous classes

We improved rendering support for named- and anonymous-inner classes. Properly rendering anonymous classes in particular is made difficult by the fact that some of its arguments are captured from the outer classes. Properly rendering anonymous constructors, with exact argument types and position, is also challenging.

Lately, a user sent us a sample making use of an anonymous class initializer to hide string decryption code. See below:

  • The anonymous class extends Android’s OnActivityResultListener, instantiates the object, and tosses it immediately.
  • Decryption code takes place in the initializer. Note the captured arguments from the outer container method __m: i, _b. Access to other private class fields is made via synthetic accessor calls that were re-sugared into seemingly direct field access (BA._b).
Pseudo-moot anonymous class with an instance initializer attempting to conceal string decryption code.

Plugin options

Remember that some decompiler properties are publicly available in the options: (menu: Edit, Options, Advanced, Engines)

  • All Dalvik decompilation options: see the .parsers.dcmp_dex.* namespace
  • All Java rendering options of decompiled code: see the .parsers.dcmp_dex.text.* namespace

1)Rendering options are real-time options that can be changed after the fact to customize the output. Right-click on a decompiled class output, and select Rendering Options:

2) Decompilation options are used to guide and customize the decompilation. They can be changed in the Engines options, or more simply, when performing a decompilation itself, by invoking “Decompile with Options…” instead of “Decompile”.

Keyword for “Decompile with Options”:
CTRL+TAB (Windows, Linux) or COMMAND+TAB (macOS)

Bring up the “Decompile with Options” dialog by using CTRL+TAB/COMMAND+TAB when decompiling. Hover over properties to get extra documentation in the tooltip.

API additions

Essential updates to:

  • IJavaSwitch: additional methods to access switch-on-enum and switch-on-string data
  • IJavaForEach: additional type introduced to manipulate for-each statements: for(Type var: iterator_or_array) { … }

Other changes, What next

JEB 3.2 contains other improvements, such as:

  • Better auto-naming, including default usage of debug data, if present (can be disabled in the options)
  • Improved typing and type propagation
  • Additional IR and AST optimizations
  • Better exceptional flow processing
  • Rendering of try-catch, synchronized blocks, etc.
  • Decompilation of invoke-polymorphic (invoke-custom is not supported, see below the part on lambdas on method handles)

We have more planned for the coming releases, including:

  • Improved support for switch-on-string. As said earlier, some of those switches, when properly detected, are re-sugared into legal Java-8 switch-on-string. However, the nature of those high-level constructs (they are implemented as double-conditionals, sometimes double-switches) makes it quite hard in some cases to provide proper reconstruction. It is something that will be improved in the future.
  • Support for generics. We had decided to not implement Java 5-style type generic since the information, when provided, is stored as pure metadata and should not be trusted. However, in practice, it turns out to be helpful when auditing legitimate, non-obfuscated compiled apps. We will add optional support for that in a coming release.
  • Support for try-with-resources. try(resource)/catch/finally are difficult very-high-level idioms to reconstruct. Optimizing compilers generate a substantial amount of additional, highly optimized code to implicitly catch exceptions and auto-close resources, making it extra difficult to reconstruct in the general case. We will likely introduce partial support before the summer.
  • Lambdas. It is a planned addition. We will soon be re-sugaring Android implementations of Java 8+ lambdas into proper lambda functions. Same goes for method handles (::). That’s quite exciting and may pave the way for a hypothetical Kotlin decompiler, since that language implicitly and explicitly rely on lambdas extensively.

Debuggable APK Generation

For several reasons, it is easier to debug Android applications explicitly marked debuggable in their Manifest.

  • Debugging non-debuggable APK requires root access to the operating system. Which means rooting a production phone, using an emulator
    2 image built as userdebug, or building a custom userdebug image from AOSP.
  • Any of the above solutions have shortcomings: rooted production builds and userdebug builds expose features that non-rooted production builds do not have, and can be fingerprinted as such; Debugging native code of applications on non-rooted devices requires replacing system-level utilities; the API level and OS features also play a role, eg, SE-Android needs to be disabled on recent OS in order for debugging to work.

In many cases, rebuilding a release app into a debug-mode app (with <application android:debuggable=”true” …>) is a viable solution, and one that does not require using root, obviously. Many users are implementing this solution via apktool. However it is frequent for the tool to fail decoding complex APKs, let alone rebuild them with different settings.

We have introduced a feature in JEB that makes rebuilding non-debuggable APK to debuggable APK easy and fast:

$ jeb_wincon.bat -c --makeapkdebug -- file.apk

Upon success, file_debuggable.apk will be generated. Sign it (Android SDK’s apksigner), install it on your device, and start debugging. Remember that this solution has its shortcomings as well! Anti-debugging code may check at runtime that the app is not debuggable, as would be expected. More elaborate solutions implement certificate pinning-style checks, where the code verifies that it is signed using a specific certificate. Be careful when debugging rebuilt APK.

This malware app was made debuggable

Keyboard Shortcuts for Script

Bind your JEB Python scripts to keyboard shortcut by adding a line at the top of your script:

#?shortcut=xxx

where xxx is your keyboard shortcut, eg: Ctrl+Shift+T

Permitted keyboard modifiers are Ctrl, Shift, Alt, as well as the generic Mod1, mapping to macOS’s Command (Apple) key, or Control on Windows/Linux.

Sublime Text 3 Extension

Are you writing Python scripts to automate your JEB reversing tasks? If so, give a try to using the “JEB Script Development Helper” package available on Sublime Text’s Package Control.

JEB Python scripts with Sublime Text

To install it:

  • Install ST3
  • Install Package Control
  • Open the Package Control and Install a new extension
  • Search for “JEB” and install the extension

The extension allows you to:

  • Auto-completion on JEB types and attributes
  • Auto-import JEB classes: CTRL+ALT+I on a class names
  • Easily create script skeleton (CTRL+SHIFT+P, “JEB: Create a new script”)
  • Easily update to the latest API doc, usually published right after a new release (CTRL+SHIFT+P, “JEB: Update to latest API doc file”)

API changes

Recent API changes are not specific to Android components of JEB. You will find updated sample code on GitHub.

  1. If you are seeing unintended changes or bugs related to this update, let us know so that we can fix things quickly.
  2. Emulator here means an emulator running a userdebug Android build, as Google-provided images are

Dynamic JNI Detection Plugin

Update (Nov 29): the plugin was open-sourced on our GitHub repository. JEB 3.0.7+ is required to load and run it.

Java applications can call native methods stored in dynamic libraries via the Java Native Interface (JNI) framework. Android apps can do the same: developers can use the NDK to write their own .so library to use and distribute.

In this post, we briefly present how the binding mechanisms work, allowing a piece of bytecode to invoke native code routines.

Named Convention Method

The easiest way to call native method is as such:

In Java, class com.example.hellojni.HelloJni:

In C:

The native method name adheres to the standard JNI naming convention, allowing automatic resolution and binding.

The corresponding Dalvik bytecode is:

and here are the the corresponding ARM instructions:

JEB automatically binds those methods together, to allow easy debugging from bytecode to native code.

However, there is another way to bind native code to Java.

Dynamic JNI Method

One can decide to bind any function to Java without adhering to the naming convention, by using the JNIEnv->RegisterNatives method.

For example, the following line of code dynamically binds the Java method add(II)I to the native method add():

Due to its dynamic nature, statically resolving those bindings can prove difficult in practice, e.g. if names were removed or mangled, or if the code is obfuscated. Therefore, not all calls to RegisterNatives may be found and/or successfully processed.

However, JEB 3.0-beta.2 (to  be released this week) ships with an EnginesPlugin to heuristically detect Рsome of Рthese methods, and perform binding Рand of course, you will also be able to debug into them.

Execute the plugin via the File, Plugins menu

Once run, it will :

  • annotate the dex code with the target addresses:

  • rename targets (prefixing names with¬†__jni_) :

  • enable you to seamlessly debug into them (jump from Java to this JNI method)

 

Heuristics

As of this writing, the plugin uses several heuristics, implemented for ARM and ARM64 (Aarch64):

  • The first is the simplest one:¬†the JNIEnv->RegisterNatives method is commonly called from the standard JNI initialization function¬†JNI_OnLoad, so JEB searches for this method and attempt to find calls to RegisterNatives.

Once the ‘BL RegisterNatives‘ is found, JEB uses the¬†decompiler to create an IR representation of the block, and determines the values of R2 and R3 (X2 and X3 on Aarch64). R3 indicates the number of native methods to register, R2 is a pointer to the array of JNI native methods (structure with a pointer to method name, a pointer to method signature and a pointer to the native function bound):

Even if accurate, this method does not work when a Branch is issued via a register (BL R4) or method name is hidden.

  • The second heuristic is based on method name. First, in Dalvik, we search for all invocations to native methods. Then, for each method found, we search in binaries if there is a String reference matching the method name. (This heuristic is dangerous but yields decent results. A future plugin update may allow users to disable it.)

If found, the plugin looks at cross references of this String and checks if it looks like the expected JNI structure.

  • The third and last heuristic is the same as the previous one, but based on arguments. Since names can be shortened, they may not be interpreted as String, and thus not referenced, whereas it is easier to find argument signatures.

These three heuristics only work when methods are defined as a static array variable. Dynamic variables would need some emulation of the JNI_OnLoad method to be resolved.

As you can see, detection is currently based on heuristics, so obfuscated methods may be missing. Feel free to tweak and improve the plugin, it is available on our GitHub repository. As usual, feel free to reach out to us (email, Twitter, Slack) if you have questions or suggestions.

DEX Version 39, Dalvik and ART Opcode Overlaps, and JEB 2.3.11

JEB 2.3.11 is out¬†We’re getting close to completion on our 2.3 branch! 1

Before we get into the matter of this blog post, a couple of noteworthy changes in terms of licensing:

  • The Android Basic builds require an active Internet connection; however, if the JEB license is current, we allow a much longer grace period before requesting a connection with our licensing server. This is to take care of scenarios where the connectivity would drop for a relatively extended period of time on either end.
  • Most interestingly, expired licenses of all types may now be used past their expiration date to reload and work on existing JDB2. New projects cannot be created with expired licenses though.

In terms of features, JEB 2.3.11 includes upgrades to our ARM64, MIPS64 and x86-64 parsers 2, as well as fixes and additions to the DEX parser. One interesting update, which prompted writing this blog post, is the support of DEX 39 opcodes.

DEX 39 Opcodes

Here they are, per the official documentation:

  • const-method-handle vAA, method_handle@BBBB
  • const-method-type vAA, proto@BBBB

Version 39 of the DEX format will be supported with the release of Android P 3. DEX 38 had been introduced to support Oreo’s new opcodes related to dynamic programming. We wrote a lengthy post about them on this very blog.

The new instructions const-method-handle and const-method-types are natural additions to retrieve method handles (basically, the same as a function pointer in C, a concept foreign to the JVM until lambdas and functional-style programming made its way into the language) and method prototypes. Those opcodes simply query into the prototypes and handles pools.

In fact, support for those two opcodes was added in JEB months ago,¬† right after their introduction in ART, which dates back to September 2017 (AOSP commit). Now, if you’ve been following through the Dalvik, DEX and ART intricacies, you may know that we are facing opcode overlaps:

  • The original non-optimized DEX instruction set spans from 0 to 0xFF, with undefined ranges (inclusive brackets omitted for clarity): 3E-43, 73, 79-7A, E3-FF
    • DEX 38 defines the range FA-FD (4x new invoke-xxx)
    • DEX 39 defines the range FE-FE for the aforementioned new opcodes (2x new const-method-xxx)
  • The now defunct optimized DEX (ODEX) set, predating ART, used the reserved sub-range E3-FE
  • The deadborn extended set used FF as an extension code to address 2-byte opcodes (FFxx); they were defined but unimplemented in Ice Cream Sandwich, and removed soon after in Jelly Bean.
  • Finally, ART opcodes: also used for optimizing DEX execution, those opcodes use the 73 and E3-FF ranges

ART opcodes in E3-FE are not necessarily the same as the original ODEX’s! The following table recaps the differences between ODEX and OART:

legend: red= removed in ART, orange= moved, green= added in ART

When you feed a piece of optimized DEX file to JEB, it may not know which instruction set to use. Normally, the following rules apply:

  • For stand-alone (within or outside an APK) DEX files advertising a version code less than or equal to 37, the legacy ODEX set would be used if any opcodes within that range are encountered;
  • For DEX files with version 38 or above, or that are part of an OAT ELF file, the newer ART set will be used.

However, if the determination is incorrect (eg, you are opening a stand-alone DEX 37 file using ART opcodes), you may manually specify which optimized opcodes set the Dalvik parser should use by opening the project’s settings (Edit/Options, Advanced…), and setting the property¬†DalvikParserMode¬†4 to:

  • 0: legacy DEX (default value)
  • 50: ART
  • 100: DEX 38
  • 110: DEX 39
  • 1000: latest

That’s it for today’s DEX clarifications. Remember to upgrade to JEB 2.3.11. On a side-note, let us know if you’d like to be part of our group of early testers: those users receive beta builds ahead of time (eg, JEB 2.3.12-beta this week).

Thank you.

  1. A couple more updates are in the pipe before we start publishing betas of JEB 3.
  2. The x86 modules now support the newest AVX-512 instruction set, although we do not decompile it
  3. Per Google’s habits, we may expect a beta of Android P with API level 28 this Spring
  4. That property is not as accessible as we’d like; an upcoming update will clarify and improve the UX around that.