From a very high-level perspective, a Dex method scheduled for decompilation goes through the following processing pipeline:
- Dalvik method converted to low-level IR
- SSA transformation and Typing
- IR optimizations
- Final high-level IR converted to AST
- AST optimizations
- Final clean AST rendered as pseudo-Java code (NOTE: access to the AST is already possible via JEB’s Java AST API)
Phase 3 consists of repeatedly calling IR processors, that essentially take an input IR and transform it into another, further refined IR (that process is called “lifting”). IR processors range from junk code cleaner, to variable propagation, immediate propagation, constant folding, higher-level construct rebuilding, compound predicate rebuilding, code restructuring, to all sort of obfuscation removal, advanced optimizers that may involve emulation, dynamic or symbolic execution, etc.
By working at this level, power-users have the ability to write custom deobfuscators, that we may not be able to deliver as JEB built-ins for a variety of reasons (e.g. obfuscation specific to a single group of files, custom protection to files under NDA, etc.).
A sample dexdec IR plugin
dexdec IR plugins are JEB back-end plugins (not front-end scripts). Therefore, they are to be dropped in the
coreplugins folder (or
coreplugins/scripts for plugin scripts). They can be written as:
- Precompiled jar files: the source language can be anything that compiles to Java bytecode; those plugins cannot be hot-swapped, and therefore are not ideal for prototyping/experimenting; they are great for mature plugins though.
- Java plugin scripts: single Java source files. Strong typing and IDE integration (e.g. with Eclipse or IntelliJ) with Javadoc makes it ideal for developing complex plugins. Hot reload is supported. (They can be seamlessly modified while JEB is running, making them great for prototyping.)
- Python plugin scripts: written in 2.7 syntax. Hot reload is supported. Restriction: unlike other plugins, an instance of a Python script plugin may be shared by multiple decompilation threads. Therefore, they must be thread-safe and support concurrency.
In this blog, we will show how to write a Python plugin script. Users familiar with JEB client scripting will be in familiar territory.
IMPORTANT! Note that loading such plugins is not enabled by default in JEB. Add the following line to your
bin/jeb-engines.cfg file to enable loading Python plugins:
.LoadPythonPlugins = true
dexdec ir plugins must implement the
IDOptimizer interface. In practice, it is highly recommended to extend the implementing class
AbstractDOptimizer, like this:
from com.pnfsoftware.jeb.core.units.code.android.ir import AbstractDOptimizer # sample IR plugin, does nothing but log the IR CFG class DOptSamplePython(AbstractDOptimizer): # perform() returns the number of optimizations performed def perform(self): self.logger.info('MARKER - Input IR-CFG: %s', self.cfg) return 0
IMPORTANT! All dexdec IR public interfaces and types are located in the com.pnfsoftware.jeb.core.units.code.android.ir package. Keep a tab opened on this page while you develop IR plugins!
The skeleton above:
- must have the same filename as the plugin class, therefore DOptSamplePython.py
- must be dropped in coreplugins/scripts/
- requires Python script plugins to be enabled in your engines configuration
If you haven’t done so, start JEB. Your plugin should appear in the list of dexdec plugins. Check the Android menu, Decompiler Plugins handler:
Now load a dex/apk, and decompile any class. Your plugin will eventually be called. The logger view should attest to that by displaying multiple “MARKER – Input IR-CFG: …” lines.
dexdec Intermediate Representation
dexdec‘s IR consists of
IDElement objects. Every IR statement is an
IDInstruction, itself an
IDElement. (All those types and their attributes are described in depth in the API doc.) When an IR plugin is called, it “receives” an
IDMethodContext (representing a decompiled method), stored in the optimizer’s ctx public field. The IR CFG, a control flow graph consisting of IR statements, can be retrieved via
ctx.getCfg(). It is also stored in the cfg public field, for convenience. A formatted IR CFG may look like this:
0000/2+ !onCreate(v4<com.pnfsoftware.raasta.AppHelp>, v5<android.os.Bundle>)<void> 0002/2: !requestWindowFeature(v4<com.pnfsoftware.raasta.AppHelp>, 1)<boolean> 0004/3: !setContentView(v4<com.pnfsoftware.raasta.AppHelp>, 7F030000)<void> 0007/5: !x4<android.webkit.WebView> = ((android.webkit.WebView)findViewById(v4<com.pnfsoftware.raasta.AppHelp>, 7F070000)<android.view.View>)<android.webkit.WebView> 000C/2: !loadData(x4<android.webkit.WebView>, getString(v4<com.pnfsoftware.raasta.AppHelp>, 7F05005B)<java.lang.String>, "text/html", "utf-8")<void> 000E/3: !setBackgroundColor(x4<android.webkit.WebView>, 0)<void> 0011/1: !setDefaultTextEncodingName(getSettings(x4<android.webkit.WebView>)<android.webkit.WebSettings>, "utf-8")<void> 0012/1: return
IDInstruction) can have any of the following opcodes (see
– IR_NOP: no-operation
– IR_ASSIGN: assignment
– IR_INVOKE: invocation (including new object and new array construction)
– IR_JUMP: unconditional jump
– IR_JCOND: conditional jump
– IR_SWITCH: switch statement
– IR_RETURN: return statement
– IR_THROW: throw statement
– IR_STORE_EXCEPTION: exception retrieval (special)
– IR_MONITOR_ENTER: VM monitor acquisition
– IR_MONITOR_EXIT: VM monitor release
Statement operands are themselves
IDImm (immediate values),
IDOperation (arithmetic/bitwise/cast operations),
IDInvokeInfo (method invocation details),
IDArrayElt (representing array elements),
IDField (representing static or instance fields), etc. Refer to the hierarchy of
IDElement for a complete list.
IR statements can be seen as recursive IR expression trees. They can be easily explored (
visitXxx method()) and manipulated. They can be replaced by newly-created elements (see
IDMethodContext.createXxx methods). Data-flow analysis can be performed on IR CFG, to retrieve use-def and def-use chains, and other variable liveness and reachability information (see
Use-case: cleaning useless Android calls
Let’s put this new API to practical, real-world use. First, some background: JEB ships with emulator-backed IR optimizers that attempt to auto-decrypt immediates such as strings. While this deobfuscator generally performs well on protected files, lately, we’ve received samples for which strings were not decrypted. The reason is quite straight-forward, see this example:
throw new java.lang.IllegalStateException(o.isUserRecoverableError.read(((char)android.text.TextUtils.getOffsetBefore("", 0)), 12 - java.lang.Long.compare(android.os.Process.getElapsedCpuTime(), 0L), (android.view.ViewConfiguration.getFadingEdgeLength() >> 16) + 798).intern());
In the above code (extracted from a protected method),
read is a string decryptor. Alas, the presence of calls such as:
- TextUtils.getOffsetBefore(“”, 0))
- Long.compare(Process.getElapsedCpuTime(), 0L)
- ViewConfiguration.getFadingEdgeLength() >> 16
prevent the generic decryptor from kicking in. Indeed, what would an emulator be supposed to make with those calls to external APIs, whose result is likely to be context-dependent? In practice though, they could be resolved by some ad-hoc optimizations:
- getOffsetBefore() algorithm is (almost) straightforward
- getElapsedCpuTime() also returns strictly positive results, making compare() operation predictable
- getFadingEdgeLength() returns small ints, less than 0x10000
We will craft the following IR optimizer: (file RemoveDummyAndroidApiCalls.py)
from com.pnfsoftware.jeb.core.units.code.android.ir import AbstractDOptimizer, IDVisitor class RemoveDummyAndroidApiCalls(AbstractDOptimizer): # note that we extend AbstractDOptimizer for convenience, instead of implementing IDOptimizer from scratch def perform(self): # create our instruction visitor vis = AndroidUtilityVisitor(self.ctx) # visit all the instructions of the IR CFG for insn in self.cfg.instructions(): insn.visitInstruction(vis) # return the count of replacements return vis.cnt class AndroidUtilityVisitor(IDVisitor): def __init__(self, ctx): self.ctx = ctx self.cnt = 0 def process(self, e, parent, results): repl = None if e.isCallInfo(): sig = e.getMethodSignature() # TextUtils.getOffsetBefore("", 0) if sig == 'Landroid/text/TextUtils;->getOffsetBefore(Ljava/lang/CharSequence;I)I' and e.getArgument(0).isImm() and e.getArgument(1).isImm(): buf = e.getArgument(0).getStringValue(self.ctx.getGlobalContext()) val = e.getArgument(1).toLong() if buf == '' and val == 0: repl = self.ctx.getGlobalContext().createInt(0) # Long.compare(xxx, 0) elif sig == 'Ljava/lang/Long;->compare(JJ)I' and e.getArgument(1).isImm() and e.getArgument(1).asImm().isZeroEquivalent(): val0 = None arg0 = e.getArgument(0) if arg0.isCallInfo(): sig2 = arg0.getMethodSignature() if sig2 == 'Landroid/os/Process;->getElapsedCpuTime()J': # elapsed time always >0, value does not matter since we are comparing against 0 val0 = 1 if val0 != None: if val0 > 0: r = 1 elif val0 < 0: r = -1 else: r = 0 repl = self.ctx.getGlobalContext().createInt(r) # ViewConfiguration.getFadingEdgeLength() elif sig == 'Landroid/view/ViewConfiguration;->getFadingEdgeLength()I': # always a small positive integer, normally set to FADING_EDGE_LENGTH (12) repl = self.ctx.getGlobalContext().createInt(12) if repl != None and parent.replaceSubExpression(e, repl): # success (this visitor is pre-order, we need to report the replaced node) results.setReplacedNode(repl) self.cnt += 1
What does this code do:
– First, it enumerates and visits all CFG instructions.
– The visitor checks for
IDCallInfo IR expressions matching the kinds of Android framework API calls described above: getOffsetBefore(), compare(getElapsedCpuTime(), 0), getFadingEdgeLength()
– It evaluates and calculates the results, and replaces IR call expressions (
IDInvokeInfo) by newly-created constants (
The resulting IR, which the plugin could print, would look like:
throw new java.lang.IllegalStateException(o.isUserRecoverableError.read(((char)0, 12 - 1, 0 + 798).intern());
Subsequently, other optimizers, built into dexdec, can kick in, clean the code further (e.g. fold constants), and make the read() invocation a candidate for string auto-decryption, yielding the following result:
The sample script can be found in your
coreplugins/scripts folder. Feel free to extend it further.
- dexdec IR plugins can also be written as Java source. Have a look at the sample file DOptSampleJava.java, located in
coreplugins/scripts. As a plugin grows in size and complexity, working with a strongly-typed language like Java, coupled with excellent javadoc integration in IDE, becomes extremely valuable.
- When prototyping IR plugins, the Dalvik code targeted for deobfuscation is oftentimes contained in a single method. In such cases, it may be cumbersome or costly to decompile entire classes. To easily decompile a single method in the GUI, do Decompile with Options (Action menu or right-click), and untick “Decompile top level container class”:
With this option disabled, when your caret is positioned on a method, issuing a decompilation request will only decompile the target method, and nothing else (not even inner classes/methods of the target will be decompiled.)
- Using the previous technique, the generated decompiled view represents an AST
IJavaMethod— not the usual
IJavaClass. Fully-qualified names are used to represent types, since import statements are not specified. An added value to the views associated with such units lies in the “IR-CFG” fragment, representing the final (most refined) IR before the AST generation phase kicked in:
- Many IR utility routines are located in the
DUtilclass. Generally, explore the ir/ package’s javadoc, you will find plenty useful information in there.
- We haven’t talked about accessing and using the emulator and sandbox. The main interface is
IDState, and we will detail some of its functionality in a later post. In the meantime, you will find sample code on our GitHub repo.