Writing dexdec IR optimizer plugins

Starting with JEB 4.2, users have the ability to instruct dexdec1 to load external Intermediate Representation (IR) optimizer plugins. 2

From a very high-level perspective, a Dex method scheduled for decompilation goes through the following processing pipeline:

  1. Dalvik method converted to low-level IR
  2. SSA transformation and Typing
  3. IR optimizations
  4. Final high-level IR converted to AST
  5. AST optimizations
  6. Final clean AST rendered as pseudo-Java code (NOTE: access to the AST is already possible via JEB’s Java AST API)

Phase 3 consists of repeatedly calling IR processors, that essentially take an input IR and transform it into another, further refined IR (that process is called “lifting”). IR processors range from junk code cleaner, to variable propagation, immediate propagation, constant folding, higher-level construct rebuilding, compound predicate rebuilding, code restructuring, to all sort of obfuscation removal, advanced optimizers that may involve emulation, dynamic or symbolic execution, etc.

By working at this level, power-users have the ability to write custom deobfuscators, that we may not be able to deliver as JEB built-ins for a variety of reasons (e.g. obfuscation specific to a single group of files, custom protection to files under NDA, etc.).

Sample dexdec IR script plugin applying custom deobfuscation to recover strings on a DexGuard (9.1+?) -protected sample

A sample dexdec IR plugin

dexdec IR plugins are JEB back-end plugins (not front-end scripts). Therefore, they are to be dropped in the coreplugins folder (or coreplugins/scripts for plugin scripts). They can be written as:

  • Precompiled jar files: the source language can be anything that compiles to Java bytecode; those plugins cannot be hot-swapped, and therefore are not ideal for prototyping/experimenting; they are great for mature plugins though.
  • Python plugin scripts: written in 2.7 syntax. Hot reload is supported: they can be seamlessly modified while JEB is running, making them great for prototyping.
  • Java plugin scripts: single Java source files; similar to Python scripts. Strong typing and IDE integration (e.g. with Eclipse or IntelliJ) with Javadoc makes it ideal for developing complex plugins. Hot reload supported as well.

In this blog, we will show how to write a Python plugin script. Users familiar with JEB client scripting will be in familiar territory.

IMPORTANT! Note that loading such plugins is not enabled by default in JEB. Add the following line to your bin/jeb-engines.cfg file to enable loading Python plugins: .LoadPythonPlugins = true

dexdec ir plugins must implement the IDOptimizer interface. In practice, it is highly recommended to extend the implementing class AbstractDOptimizer, like this:

from com.pnfsoftware.jeb.core.units.code.android.ir import AbstractDOptimizer

# sample IR plugin, does nothing but log the IR CFG
class DOptSamplePython(AbstractDOptimizer):

  # perform() returns the number of optimizations performed
  def perform(self):
    self.logger.info('MARKER - Input IR-CFG: %s', self.cfg)
    return 0

IMPORTANT! All dexdec IR public interfaces and types are located in the com.pnfsoftware.jeb.core.units.code.android.ir package. Keep a tab opened on this page while you develop IR plugins!

The skeleton above:

  • must have the same filename as the plugin class, therefore DOptSamplePython.py
  • must be dropped in coreplugins/scripts/
  • requires Python script plugins to be enabled in your engines configuration

If you haven’t done so, start JEB. Your plugin should appear in the list of dexdec plugins. Check the Android menu, Decompiler Plugins handler:

A list of external Dex decompiler plugins

Now load a dex/apk, and decompile any class. Your plugin will eventually be called. The logger view should attest to that by displaying multiple “MARKER – Input IR-CFG: …” lines.

dexdec Intermediate Representation

dexdec‘s IR consists of IDElement objects. Every IR statement is an IDInstruction, itself an IDElement. (All those types and their attributes are described in depth in the API doc.) When an IR plugin is called, it “receives” an IDMethodContext (representing a decompiled method), stored in the optimizer’s ctx public field. The IR CFG, a control flow graph consisting of IR statements, can be retrieved via ctx.getCfg(). It is also stored in the cfg public field, for convenience. A formatted IR CFG may look like this:

0000/2+  !onCreate(v4<com.pnfsoftware.raasta.AppHelp>, v5<android.os.Bundle>)<void>                            
0002/2:  !requestWindowFeature(v4<com.pnfsoftware.raasta.AppHelp>, 1)<boolean>                                 
0004/3:  !setContentView(v4<com.pnfsoftware.raasta.AppHelp>, 7F030000)<void>                                   
0007/5:  !x4<android.webkit.WebView> = ((android.webkit.WebView)findViewById(v4<com.pnfsoftware.raasta.AppHelp>, 7F070000)<android.view.View>)<android.webkit.WebView>  
000C/2:  !loadData(x4<android.webkit.WebView>, getString(v4<com.pnfsoftware.raasta.AppHelp>, 7F05005B)<java.lang.String>, "text/html", "utf-8")<void>  
000E/3:  !setBackgroundColor(x4<android.webkit.WebView>, 0)<void>                                              
0011/1:  !setDefaultTextEncodingName(getSettings(x4<android.webkit.WebView>)<android.webkit.WebSettings>, "utf-8")<void>  
0012/1:  return         

Statements can have any of the following opcodes (see DOpcodeType): IR_NOP, IR_ASSIGN, IR_INVOKE, IR_JUMP, IR_JCOND, IR_SWITCH, IR_RETURN, IR_THROW, IR_STORE_EXCEPTION, IR_MONITOR_ENTER, IR_MONITOREXIT.

Statement operands are themselves IDElements, usually IDExpressions. Examples: IDImm (immediate values), IDVar (variables), IDOperation (arithmetic/bitwise/cast operations), IDInvokeInfo (method invocation details), IDArrayElt (representing array elements), IDField (representing static or instance fields), etc. Refer to the hierarchy of IDElement for a complete list.

IR statements can be seen as recursive IR expression trees. They can be easily explored (visitXxx method()) and manipulated. They can be replaced by newly-created elements (see IDMethodContext.createXxx methods). Data-flow analysis can be performed on IR CFG, to retrieve use-def and def-use chains, and other variable liveness and reachability information (see cfg.doDataFlowAnalysis).

Use-case: cleaning useless Android calls

Let’s put this new API to practical, real-world use. First, some background: JEB ships with emulator-backed IR optimizers that attempt to auto-decrypt immediates such as strings. While this deobfuscator generally performs well on DexGuard-protected files, lately, we’ve received samples for which strings were not decrypted. The reason is quite straight-forward, see this example:

throw new java.lang.IllegalStateException(o.isUserRecoverableError.read(((char)android.text.TextUtils.getOffsetBefore("", 0)), 12 - java.lang.Long.compare(android.os.Process.getElapsedCpuTime(), 0L), (android.view.ViewConfiguration.getFadingEdgeLength() >> 16) + 798).intern());

In the above code (extracted from a protected method), read is a string decryptor. Alas, the presence of calls such as:

  • TextUtils.getOffsetBefore(“”, 0))
  • Long.compare(Process.getElapsedCpuTime(), 0L)
  • ViewConfiguration.getFadingEdgeLength() >> 16

prevent the generic decryptor from kicking in. Indeed, what would an emulator be supposed to make with those calls to external APIs, whose result is likely to be context-dependent? In practice though, they could be resolved by some ad-hoc optimizations:

  • getOffsetBefore() algorithm is (almost) straightforward
  • getElapsedCpuTime() also returns strictly positive results, making compare() operation predictable
  • getFadingEdgeLength() returns small ints, less than 0x10000

We will craft the following IR optimizer: (file DGReplaceApiCalls.py)

from com.pnfsoftware.jeb.core.units.code.android.ir import AbstractDOptimizer, IDVisitor

class DGReplaceApiCalls(AbstractDOptimizer):  # note that we extend AbstractDOptimizer for convenience, instead of implementing IDOptimizer from scratch
  def perform(self):
    # create our instruction visitor
    vis = AndroidUtilityVisitor(self.ctx)
    # visit all the instructions of the IR CFG
    for insn in self.cfg.instructions():
      insn.visitInstruction(vis)
    # return the count of replacements
    return vis.cnt

class AndroidUtilityVisitor(IDVisitor):
  def __init__(self, ctx):
    self.ctx = ctx
    self.cnt = 0

  def process(self, e, parent, results):
    repl = None

    if e.isCallInfo():
      sig = e.getMethodSignature()

      # TextUtils.getOffsetBefore("", 0)
      if sig == 'Landroid/text/TextUtils;->getOffsetBefore(Ljava/lang/CharSequence;I)I' and e.getArgument(0).isImm() and e.getArgument(1).isImm():
        buf = e.getArgument(0).getStringValue(self.ctx.getGlobalContext())
        val = e.getArgument(1).toLong()
        if buf == '' and val == 0:
          repl = self.ctx.getGlobalContext().createInt(0)

      # Long.compare(xxx, 0)
      elif sig == 'Ljava/lang/Long;->compare(JJ)I' and e.getArgument(1).isImm() and e.getArgument(1).asImm().isZeroEquivalent():
        val0 = None
        arg0 = e.getArgument(0)
        if arg0.isCallInfo():
          sig2 = arg0.getMethodSignature()
          if sig2 == 'Landroid/os/Process;->getElapsedCpuTime()J':
            # elapsed time always >0, value does not matter since we are comparing against 0
            val0 = 1
        if val0 != None:
          if val0 > 0:
            r = 1
          elif val0 < 0:
            r = -1
          else:
            r = 0
          repl = self.ctx.getGlobalContext().createInt(r)

      # ViewConfiguration.getFadingEdgeLength()
      elif sig == 'Landroid/view/ViewConfiguration;->getFadingEdgeLength()I':
        # always a small positive integer, normally set to FADING_EDGE_LENGTH (12)
        repl = self.ctx.getGlobalContext().createInt(12)

    if repl != None and parent.replaceSubExpression(e, repl):
      # success (this visitor is pre-order, we need to report the replaced node)
      results.setReplacedNode(repl)
      self.cnt += 1

What does this code do:
– First, it enumerates and visits all CFG instructions.
– The visitor checks for IDCallInfo IR expressions matching the kinds of Android framework API calls described above: getOffsetBefore(), compare(getElapsedCpuTime(), 0), getFadingEdgeLength()
– It evaluates and calculates the results, and replaces IR call expressions (IDInvokeInfo) by newly-created constants (IDImm).

The resulting IR, which the plugin could print, would look like:

throw new java.lang.IllegalStateException(o.isUserRecoverableError.read(((char)0, 12 - 1, 0 + 798).intern());

Subsequently, other optimizers, built into dexdec, can kick in, clean the code further (e.g. fold constants), and make the read() invocation a candidate for string auto-decryption, yielding the following result:

Our external IR plugin is enable. The IR can be cleaned, the auto-decryption takes place.

Done!

The DGReplaceApiCalls.py script can be found in your coreplugins/scripts folder. Feel free to extend it further. It appears that recent versions of DexGuard makes extensive use of these tricks to thwart auto-deobfuscators.

Tips

  • dexdec IR plugins can also be written as Java source. Have a look at the sample file DOptSampleJava.java, located in coreplugins/scripts. As a plugin grows in size and complexity, working with a strongly-typed language like Java, coupled with excellent javadoc integration in IDE, becomes extremely invaluable.
  • When prototyping IR plugins, the work usually takes place on a single method. It may be cumbersome (and sometimes costly, especially when working on obfuscated code) to decompile entire classes. To easily decompile a single method in the GUI, do Decompile with Options (Action menu or right-click), and untick “Decompile top level container class”:

With this option disabled, when your caret is positioned on a method, issuing a decompilation request will only decompile the target method, and nothing else (not even inner classes/methods of the target will be decompiled.)

  • Using the previous technique, the generated decompiled view represents an AST IJavaMethod — not the usual IJavaClass. Fully-qualified names are used to represent types, since import statements are not specified. An added value to the views associated with such units lies in the “IR-CFG” fragment, representing the final (most refined) IR before the AST generation phase kicked in:
Final IR viewed in the source unit for an IJavaMethod
  • Many IR utility routines are located in the DUtil class. Generally, explore the ir/ package’s javadoc, you will find plenty useful information in there.

That’s it for now. We’ll publish more about this in the summer. Have fun crafting your own IR plugins. As usual, reach us on Twitter’s @jebdec, Slack’s jebdecompiler, or privately over email. Until next time! – Nicolas

  1. dexdec is JEB’s Dex/Dalvik decompiler; gendec is JEB’s generic decompiler for all other architectures (x86, arm, etc.).
  2. Note that gendec has been allowing that for quite some time; its IR is different than dexdec‘s IR though.

Published by

Nicolas Falliere

Author of JEB.

Leave a Reply

Your email address will not be published. Required fields are marked *

*