Reversing DexGuard, Part 3 – Code Virtualization

Reversing DexGuard: Part 1, Part 2, Part 3

The third part of this series on DexGuard is about bytecode virtualization. The analyses that follow were done statically using JEB 3.22.

Bytecode virtualization is, in my opinion, the most interesting and technically challenging feature of DexGuard. According to GuardSquare’s website, it was introduced in version 8.3 (mid 2019).

TL;DR: JEB Pro can “unvirtualize” methods virtualized by DexGuard 8.3+.

What Is Code Virtualization

Relatively novel, code virtualization is possibly one of the most effective protection technique there is 1. With it come relatively heavy disadvantages, such as hampered speed of execution 2 and the difficulty to troubleshoot production code. The advantages are heightened reverse-engineering hurdles over other more traditional software protection techniques.

Virtualization in the context of code protection means:

  • Generating a virtual machine M1
  • Translating an original code object C0 meant to be executed on a machine M0 3, into a semantically-equivalent code object C1, to be run on M1.

While the general features of M1 are likely to be fixed (e.g., all generations of M1 are stack machines with such and such characteristics), the Instruction Set Architecture (ISA) of M1 may not necessarily be. For example, opcodes, microcodes and their implementation may vary from generation to generation. As for C1, the characteristics of a generation are only constrained by the capabilities of the converter. Needless to say, standard obfuscation techniques can be applied on C1. The virtualization process can possibly be recursive (C1 could be a VM implementing the specifications of a machine M2, executing a code object C2, emulating the original behavior of C0, etc.).

All in all, in practice, this makes M1 and C1 unique and hard to reverse-engineer.

Before and after virtualization of a code object C0 into C1

Example of a Protected Method

Note: all identifier names had been obfuscated, as is almost always the case with DexGuard. They were renamed for clarity and understanding.

Below, the class VClass was found to be “virtualized”. A virtualized class means that all non-constructor (all but <init>(*)V and <clinit>()V) methods were virtualized.

Interestingly, DexGuard does not seem to virtualize constructors

The method d(byte[])byte[] is virtualized:

  • It was converted into an interpreter loop over two large switch constructs that branch on pseudo-code entries stored in the local array pcode.
  • A PCodeVM class was added. It is a modified stack-based virtual machine (more below) that performs basic load/store operations, custom loads/stores, as well as some arithmetic, binary and logical operations.
Virtualized method. Note the pcode array. The opcode handlers are located in two switches. This picture shows the second switch, used to handle specific operations and API calls.

A snippet of the p-code VM class. Full code here, also contains the virtualized class.

The generic interpreter is called via vm.exec(opcode). Execution falls back to a second switch entry, in the virtualized method, if the operation was not handled.

Please refer to the gist linked above for a full list of “generic” VM operations. Three examples, including one showing that the operations are not as generic as the term implies:

(specific to this VM) opcode 6, used to peek the most recently pushed object
(specific to this VM) opcode 8, a push-int operation
(specific to this VM) opcode 23 is relatively specialized, it implements an add-xor stack operation (pop, pop, push). It is quite interesting to see that the protection system does not simply generate one-to-one, dalvik-to-VM opcodes. Instead, the target routine is thoroughly analyzed, most likely lifted, high-level (compounded) arithmetic operations isolated, and pseudo-generic (in PCodeVM) or specialized (in the virtualized method) opcodes generated.

As said, negative opcodes represent custom operations specific to a virtualized method, including control flow changes. An example:

opcode -25: a if(a >=b) goto LABEL operation (first, call into opcode 55 to do a GE operation on the top two integers; then, use the result to do conditional branching)

Characteristics of the DexGuard P-code VM

From the analysis of that code as well as virtualized methods found in other binaries, the characteristics of the p-code VM generated by DexGuard can be inferred:

  • The VM is a hybrid stack machine that uses 5 parallel stacks of the same height, stored in arrays of:
    • java.lang.Object (accommodating all objects, including arrays)
    • int (accommodating all small integers, including boolean and char)
    • long
    • float
    • double
  • For each one of the 5 stack types above, the VM uses two additional registers for storing and loading
  • Two stack pointers are used: one indicates the stack TOP, the other one seems to be used more liberally, and is akin to a peek register
  • The stack has a reserved area to store the virtualized method parameters (including this if the method is non-static)
  • The ISA encoding is trivial: each instruction is exactly one-word long, it is the opcode of the p-code instruction to be executed. There is no concept of register, index, or immediate value embedded into the instruction, as most stack machine ISA’s have.
  • Because the ISA is so simple, the implementation of the semantics of an instruction falls almost entirely on the p-code handler. For this reason, they were grouped into two categories:
    • Semi-generic VM operations (load/store, arithmetic, binary, tests) are handled by the VM class and have a positive id. (A VM object is used by every virtualized method in a virtualized class.)
    • Operations specific to a given virtualized method (e.g., method invocations) use negative ids and are handled within the virtualized method itself.

P-code obfuscation: junk insertion, spaghetti code

While the PCodeVM opcodes are all “useful”, many specific opcodes of a virtualized method (negative ids) achieve nothing but the execution of code semantically equivalent to NOP or GOTO.

opcodes -2, -1: essentially branching instructions. A substantial amount of those can be found, including some branching to blocks with no other input but that source (i.e., an unnecessary GOTO – =spaghetti code -, or a NOP operation if the next block is the follow.)

Rebuilding Virtualized Methods

Below, we explain the process used to rebuild a virtualized method. The CFG’s presented are IR-CFG’s (Intermediate Representations) used by the dexdec 4 pipeline. Note that unlike gendec‘s IR 5, dexdec‘s IR is not exposed publicly, but its textual representation is mostly self-explanatory.

Overall, a virtualized routine, once processed by dexdec like any other routine, looks like the following: A loop over p-code entries (stored in x8 below), processed by a() at 0xE first, or by the large routine switch.

Virtualized method, optimized, virtualized

The routine a() is PCodeVM.exec(), and its optimized IR boils down to a large single switch. 6

PCodeVM.exec()

The unvirtualizer needs to identify key items in order to get started, such as the p-code entries, identifiers used as indices into the p-code array, etc. Once they have been gathered, concolic execution of the virtualized routine becomes possible, and allows rebuilding a raw version of the original execution flow. Multiple caveats need to be taken care of, such as p-code inlining, branching, or flow termination. In its current state, the unvirtualizer disregards exceptional control flow.

Below, a raw version of the unflattened CFG. Note that all operations are stack-based; the code itself has not been modified at this point, it still consists of VM stack-based operations.

Virtualized method after unflattening, raw

dexdec’s standard IR optimization passes (dead-code removal, constant and variable propagation, folding, arithmetic simplification, flow simplifications, etc.) clean up the code substantially:

Virtualized method after unflattening and IR optimizations (opt1)

At this stage, all operations are stack-based. The high-level code generated from the above would be quite unwieldy and difficult to analyze, although substantially better than the original double-switch.

The next stage is to analyze stack-based operations to recover stack slots uses and convert them back to identifiers (which can be viewed as virtual registers; essentially, we realize the conversion of stack-based operations into register-based ones). Stack analysis can be done in a variety of ways, for example, using fixed-point analysis. Again, several caveats apply, and the need to properly identify stacks as well as their indices is crucial for this operations.

Virtualized method after unflattening, IR optimizations, VM stack analysis (opt2)

After another round of optimizations:

Virtualized method after unflattening, IR optimizations, VM stack analysis, IR optimizations (opt2_1)

Once the stack analysis is complete, we can replace stack slot accesses by identifier accesses.

Virtualized method after unflattening, IR optimizations, VM stack analysis, IR optimizations, virtual registers insertion (opt3)

After a round of optimizations:

Virtualized method after unflattening, IR optimizations, VM stack analysis, IR optimizations, virtual registers insertion, IR optimizations (opt3)

At this point, the “original” CFG is essentially reconstructed, and other advanced deobfuscation passes (e.g., emulated-based deobfuscators) can be applied.

The high-level code generation yields a clean, unvirtualized routine:

High-level code, unvirtualized, unmarked

After reversing, it appears to be a modified RC4 algorithm. Note the +3/+4 added to the key.

High-level code, unvirtualized, marked

Detecting Virtualized Methods

All versions of JEB detect methods and classes virtualized by DexGuard: run Global Analysis (GUI menu: Android) on your APK/DEX and look for those special events:

dexdec event:
“Found DexGuard Virtualized routine handler (P-Code VM, can be found in DexGuard>=8.3)”

JEB Pro version 3.22 7 ships with the unvirtualizer module.

Tips:

  • Make sure to enable the Obfuscators, and enable DexGuard Unvirtualization (enabled by default in the options).
  • The try-blocks analysis must be disabled for the class to unvirtualize. (Use MOD1+TAB to redecompile, untick “Parse Exception Blocks”).
  • After a first decompilation pass, it may be easier to identify guard0/guard1, rename, and recompile, else OP obfuscation will remain and make the code unnecessarily difficult to read. (Refer to part 1 of this series to learn about what renaming those fields to those special names means and does when DexGuard is detected.)

Conclusion

We hope you enjoyed this third installment on code (un)virtualization.

There will likely be a fourth and final chapter to this series on native code protection 8. Until next time!

  1. On a personal note, my first foray into VM-based protection dates back to 2009 with the analysis of Trojan.Clampi, a Windows malware protected with VMProtect
  2. Although one could argue that with current hardware (fast x64/ARM64 processors) and software (JIT’er and AOT compilers), that drawback may not be as relevant as it used to be.
  3. Machine here may be understood as physical machine or virtual machine
  4. dexdec is JEB’s dex decompiler engine
  5. gendec is JEB’s generic decompilation pipeline
  6. Note the similarities with CFG flattened by chenxification and similar techniques. One key difference here is that the next block may be determined using the p-code array, instead of a key variable, updated after each operation. I.e., is the FSM – controlling what the next state (= the next basic block) is – embedded in the flattened code itself, or implemented as a p-code array.
  7. JEB Android and JEB demo builds do not ship the unvirtualizer module. I initially wrote this module as a proof-of-concept not intended for release, but eventually decided to offer it to our professional users who have legitimate (non malicious) use cases, e.g. code audits and black-box assessments.
  8. That will be an appropriate segue into presenting JEB 4, which release was delayed far too long!

Reversing DexGuard, Part 2 – Assets and Code Encryption

Reversing DexGuard: Part 1, Part 2, Part 3

The second part of this series on DexGuard focuses on encryption:

  • Asset encryption
  • Class encryption
  • Full application encryption

Those analyses were done statically using JEB 3.21.

Asset Encryption

DexGuard can encrypt assets, while combining other techniques, such as class encryption (seen in several high-profile apps), and bytecode obfuscation (control-flow obfuscation, string encryption, reflected API access). With most bytecode obfuscation being automatically cleaned up, Assets are being accessed in the following way:

Purple and cyan tokens represent auto-decrypted code. The assets decryptor method was renamed to ‘dec’, it provides a FilterInputStream that transparently decrypts contents.
The DecryptorFilterStream (renamed) factory method

The DecryptorFilterStream object implements a variant of TEA (Tiny Encryption Algorithm), known for its simplicity of implementation and great performance 1.

Note the convoluted generation of Q_w, instead of hard-coding the immediate 0x9E37. Incidentally, a variant of that constant is also used by RC5 and RC6.
read() decrypts and buffers 64 bits of data at a time. The decryption loop consists of a variable number of rounds, between 5 and 16. Note that Q_w is used as a multiplier instead of an offset, as TEA/XTEA normally does.

It seems reasonable to assume that the encryption and decryption algorithms may not always be the same as this one. DexGuard making extensive use of polymorphism throughout its protection layers, it could be the case that during the protection phase, the encryption primitive is either user-selected or selected semi-randomly.

JEB can automatically emulate throughout this code and extract assets, and in fact, this is how encrypted classes, described in the next section, were extracted for analysis. However, this functionality is not present in current JEB Release builds. Since the vast majority of DexGuard uses are legitimate, we thought that shipping one-click auto-decryptors for data and code at this time was unnecessary, and would jeopardize the app security of several high-profile vendors.

Class Encryption

Class encryption, as seen in multiple recent apps as well, works as follows:

  • The class to be protected, CP, is encrypted, compressed, and stored in a file within the app folder. (The filename is random and seems to be terminated by a dot, although that could easily change.) Once decrypted, the file is a JAR containing a DEX holding CP and related classes.
  • CP is managed by a custom ClassLoader, CL.
  • CL is also encrypted, compressed, and stored in a file within the app folder. Once decrypted, the file is a JAR containing a DEX holding the custom class loader CL.
  • Within the application, code using CP (that is, any client that loads CP, invokes CP methods, or accesses CP fields) is replaced by code using CM, a class manager responsible for extracting CP and CL, and loading CL. CM offers bridge methods to the clients of CP, in order to achieve the original functionality.

The following diagram summarizes this mechanism:

DexGuard class encryption mechanism

Since applications protected with DexGuard use its extensive RASP (Runtime Application Self-Protection) facility to validate the environment they’re running on, the dynamic retrieval of CL and CP may prove difficult. In this analysis, it was retrieved statically by JEB.

Below, some client code using CM to create an encrypted-class object CP and execute a method on it. Everything is done via reflection. Items were renamed for enhanced clarity.

Encrypted class loading and virtual method invocation

CM is a heavy class, highly obfuscated. The first step in understanding it is to:

With auto-decryption and auto-unreflection enabled, the result is quite readable. A few snippets follow:

Decrypted files are deleted after loading. On older devices, loading is done with DexFile; on newer devices, it is done using InMemoryDexClassLoader.
In this case, the first encrypted JAR file (holding CL) is stored as “/e.”.
In this case, the second encrypted JAR file (holding CP and related) is stored as “/f.”.
The application held two additional couples, (“/a.”, “/b.”) and (“/c.”, “/d.”)

Once retrieved, those additional files can easily be “added” to the current DEX unit with IDexUnit.addDex() of your JEB project. Switch to the Terminal fragment, activate the Python interpreter (use py), and issue a command like:

Using Jython’s to add code to an existing DEX unit
The bnz class (CL) is a ClassLoader for the protected class (CP).

The protected class CP and other related classes, stored in “/f.” contained… anti-tampering verification code, which is part of DexGuard’s RASP facility! In other instances that were looked at, the protected classes contained: encrypted assets manager, custom code, API key maps, more RASP code, etc.

Full Application Encryption

“Full” encryption is taking class encryption to the extreme by encrypting almost all classes of an application. A custom Application object is generated, which simply overloads attachBaseContext(). On execution, the encrypted class manager will be called to decrypt and load the “original” application (all DexGuard protections still apply).

Custom application object used to provide full program encryption.

Note that activities can be encrypted as well. In the above case, the main activity is part of the encrypted jar.

Conclusion

That’s it for part 2. We focused on the encryption features of DexGuard. Both offer relatively limited protection for reverse-engineers willing to go the extra mile to retrieve original assets and bytecodes.

In Part 3, we will present what I think is the most interesting feature of DexGuard: code virtualization.

Until next time!

  1. The TEA encryption family is used by many win32 packers