Writing dexdec IR optimizer plugins

Starting with JEB 4.2, users have the ability to instruct dexdec1 to load external Intermediate Representation (IR) optimizer plugins. 2

From a very high-level perspective, a Dex method scheduled for decompilation goes through the following processing pipeline:

  1. Dalvik method converted to low-level IR
  2. SSA transformation and Typing
  3. IR optimizations
  4. Final high-level IR converted to AST
  5. AST optimizations
  6. Final clean AST rendered as pseudo-Java code (NOTE: access to the AST is already possible via JEB’s Java AST API)

Phase 3 consists of repeatedly calling IR processors, that essentially take an input IR and transform it into another, further refined IR (that process is called “lifting”). IR processors range from junk code cleaner, to variable propagation, immediate propagation, constant folding, higher-level construct rebuilding, compound predicate rebuilding, code restructuring, to all sort of obfuscation removal, advanced optimizers that may involve emulation, dynamic or symbolic execution, etc.

By working at this level, power-users have the ability to write custom deobfuscators, that we may not be able to deliver as JEB built-ins for a variety of reasons (e.g. obfuscation specific to a single group of files, custom protection to files under NDA, etc.).

Sample dexdec IR script plugin applying custom deobfuscation to recover strings on a protected sample

A sample dexdec IR plugin

dexdec IR plugins are JEB back-end plugins (not front-end scripts). Therefore, they are to be dropped in the coreplugins folder (or coreplugins/scripts for plugin scripts). They can be written as:

  • Precompiled jar files: the source language can be anything that compiles to Java bytecode; those plugins cannot be hot-swapped, and therefore are not ideal for prototyping/experimenting; they are great for mature plugins though.
  • Java plugin scripts: single Java source files. Strong typing and IDE integration (e.g. with Eclipse or IntelliJ) with Javadoc makes it ideal for developing complex plugins. Hot reload is supported. (They can be seamlessly modified while JEB is running, making them great for prototyping.)
  • Python plugin scripts: written in 2.7 syntax. Hot reload is supported. Restriction: unlike other plugins, an instance of a Python script plugin may be shared by multiple decompilation threads. Therefore, they must be thread-safe and support concurrency.

In this blog, we will show how to write a Python plugin script. Users familiar with JEB client scripting will be in familiar territory.

IMPORTANT! Note that loading such plugins is not enabled by default in JEB. Add the following line to your bin/jeb-engines.cfg file to enable loading Python plugins: .LoadPythonPlugins = true

dexdec ir plugins must implement the IDOptimizer interface. In practice, it is highly recommended to extend the implementing class AbstractDOptimizer, like this:

from com.pnfsoftware.jeb.core.units.code.android.ir import AbstractDOptimizer

# sample IR plugin, does nothing but log the IR CFG
class DOptSamplePython(AbstractDOptimizer):

  # perform() returns the number of optimizations performed
  def perform(self):
    self.logger.info('MARKER - Input IR-CFG: %s', self.cfg)
    return 0

IMPORTANT! All dexdec IR public interfaces and types are located in the com.pnfsoftware.jeb.core.units.code.android.ir package. Keep a tab opened on this page while you develop IR plugins!

The skeleton above:

  • must have the same filename as the plugin class, therefore DOptSamplePython.py
  • must be dropped in coreplugins/scripts/
  • requires Python script plugins to be enabled in your engines configuration

If you haven’t done so, start JEB. Your plugin should appear in the list of dexdec plugins. Check the Android menu, Decompiler Plugins handler:

A list of external Dex decompiler plugins

Now load a dex/apk, and decompile any class. Your plugin will eventually be called. The logger view should attest to that by displaying multiple “MARKER – Input IR-CFG: …” lines.

dexdec Intermediate Representation

dexdec‘s IR consists of IDElement objects. Every IR statement is an IDInstruction, itself an IDElement. (All those types and their attributes are described in depth in the API doc.) When an IR plugin is called, it “receives” an IDMethodContext (representing a decompiled method), stored in the optimizer’s ctx public field. The IR CFG, a control flow graph consisting of IR statements, can be retrieved via ctx.getCfg(). It is also stored in the cfg public field, for convenience. A formatted IR CFG may look like this:

0000/2+  !onCreate(v4<com.pnfsoftware.raasta.AppHelp>, v5<android.os.Bundle>)<void>                            
0002/2:  !requestWindowFeature(v4<com.pnfsoftware.raasta.AppHelp>, 1)<boolean>                                 
0004/3:  !setContentView(v4<com.pnfsoftware.raasta.AppHelp>, 7F030000)<void>                                   
0007/5:  !x4<android.webkit.WebView> = ((android.webkit.WebView)findViewById(v4<com.pnfsoftware.raasta.AppHelp>, 7F070000)<android.view.View>)<android.webkit.WebView>  
000C/2:  !loadData(x4<android.webkit.WebView>, getString(v4<com.pnfsoftware.raasta.AppHelp>, 7F05005B)<java.lang.String>, "text/html", "utf-8")<void>  
000E/3:  !setBackgroundColor(x4<android.webkit.WebView>, 0)<void>                                              
0011/1:  !setDefaultTextEncodingName(getSettings(x4<android.webkit.WebView>)<android.webkit.WebSettings>, "utf-8")<void>  
0012/1:  return         

Statements (IDInstruction) can have any of the following opcodes (see DOpcodeType):
– IR_NOP: no-operation
– IR_ASSIGN: assignment
– IR_INVOKE: invocation (including new object and new array construction)
– IR_JUMP: unconditional jump
– IR_JCOND: conditional jump
– IR_SWITCH: switch statement
– IR_RETURN: return statement
– IR_THROW: throw statement
– IR_STORE_EXCEPTION: exception retrieval (special)
– IR_MONITOR_ENTER: VM monitor acquisition
– IR_MONITOR_EXIT: VM monitor release

Statement operands are themselves IDElements, usually IDExpressions. Examples: IDImm (immediate values), IDVar (variables), IDOperation (arithmetic/bitwise/cast operations), IDInvokeInfo (method invocation details), IDArrayElt (representing array elements), IDField (representing static or instance fields), etc. Refer to the hierarchy of IDElement for a complete list.

IR statements can be seen as recursive IR expression trees. They can be easily explored (visitXxx method()) and manipulated. They can be replaced by newly-created elements (see IDMethodContext.createXxx methods). Data-flow analysis can be performed on IR CFG, to retrieve use-def and def-use chains, and other variable liveness and reachability information (see cfg.doDataFlowAnalysis).

Use-case: cleaning useless Android calls

Let’s put this new API to practical, real-world use. First, some background: JEB ships with emulator-backed IR optimizers that attempt to auto-decrypt immediates such as strings. While this deobfuscator generally performs well on protected files, lately, we’ve received samples for which strings were not decrypted. The reason is quite straight-forward, see this example:

throw new java.lang.IllegalStateException(o.isUserRecoverableError.read(((char)android.text.TextUtils.getOffsetBefore("", 0)), 12 - java.lang.Long.compare(android.os.Process.getElapsedCpuTime(), 0L), (android.view.ViewConfiguration.getFadingEdgeLength() >> 16) + 798).intern());

In the above code (extracted from a protected method), read is a string decryptor. Alas, the presence of calls such as:

  • TextUtils.getOffsetBefore(“”, 0))
  • Long.compare(Process.getElapsedCpuTime(), 0L)
  • ViewConfiguration.getFadingEdgeLength() >> 16

prevent the generic decryptor from kicking in. Indeed, what would an emulator be supposed to make with those calls to external APIs, whose result is likely to be context-dependent? In practice though, they could be resolved by some ad-hoc optimizations:

  • getOffsetBefore() algorithm is (almost) straightforward
  • getElapsedCpuTime() also returns strictly positive results, making compare() operation predictable
  • getFadingEdgeLength() returns small ints, less than 0x10000

We will craft the following IR optimizer: (file RemoveDummyAndroidApiCalls.py)

from com.pnfsoftware.jeb.core.units.code.android.ir import AbstractDOptimizer, IDVisitor

class RemoveDummyAndroidApiCalls(AbstractDOptimizer):  # note that we extend AbstractDOptimizer for convenience, instead of implementing IDOptimizer from scratch
  def perform(self):
    # create our instruction visitor
    vis = AndroidUtilityVisitor(self.ctx)
    # visit all the instructions of the IR CFG
    for insn in self.cfg.instructions():
      insn.visitInstruction(vis)
    # return the count of replacements
    return vis.cnt

class AndroidUtilityVisitor(IDVisitor):
  def __init__(self, ctx):
    self.ctx = ctx
    self.cnt = 0

  def process(self, e, parent, results):
    repl = None

    if e.isCallInfo():
      sig = e.getMethodSignature()

      # TextUtils.getOffsetBefore("", 0)
      if sig == 'Landroid/text/TextUtils;->getOffsetBefore(Ljava/lang/CharSequence;I)I' and e.getArgument(0).isImm() and e.getArgument(1).isImm():
        buf = e.getArgument(0).getStringValue(self.ctx.getGlobalContext())
        val = e.getArgument(1).toLong()
        if buf == '' and val == 0:
          repl = self.ctx.getGlobalContext().createInt(0)

      # Long.compare(xxx, 0)
      elif sig == 'Ljava/lang/Long;->compare(JJ)I' and e.getArgument(1).isImm() and e.getArgument(1).asImm().isZeroEquivalent():
        val0 = None
        arg0 = e.getArgument(0)
        if arg0.isCallInfo():
          sig2 = arg0.getMethodSignature()
          if sig2 == 'Landroid/os/Process;->getElapsedCpuTime()J':
            # elapsed time always >0, value does not matter since we are comparing against 0
            val0 = 1
        if val0 != None:
          if val0 > 0:
            r = 1
          elif val0 < 0:
            r = -1
          else:
            r = 0
          repl = self.ctx.getGlobalContext().createInt(r)

      # ViewConfiguration.getFadingEdgeLength()
      elif sig == 'Landroid/view/ViewConfiguration;->getFadingEdgeLength()I':
        # always a small positive integer, normally set to FADING_EDGE_LENGTH (12)
        repl = self.ctx.getGlobalContext().createInt(12)

    if repl != None and parent.replaceSubExpression(e, repl):
      # success (this visitor is pre-order, we need to report the replaced node)
      results.setReplacedNode(repl)
      self.cnt += 1

What does this code do:
– First, it enumerates and visits all CFG instructions.
– The visitor checks for IDCallInfo IR expressions matching the kinds of Android framework API calls described above: getOffsetBefore(), compare(getElapsedCpuTime(), 0), getFadingEdgeLength()
– It evaluates and calculates the results, and replaces IR call expressions (IDInvokeInfo) by newly-created constants (IDImm).

The resulting IR, which the plugin could print, would look like:

throw new java.lang.IllegalStateException(o.isUserRecoverableError.read(((char)0, 12 - 1, 0 + 798).intern());

Subsequently, other optimizers, built into dexdec, can kick in, clean the code further (e.g. fold constants), and make the read() invocation a candidate for string auto-decryption, yielding the following result:

Our external IR plugin is enabled. The IR can be cleaned, the auto-decryption takes place.

Done!

The sample script can be found in your coreplugins/scripts folder. Feel free to extend it further.

Tips

  • dexdec IR plugins can also be written as Java source. Have a look at the sample file DOptSampleJava.java, located in coreplugins/scripts. As a plugin grows in size and complexity, working with a strongly-typed language like Java, coupled with excellent javadoc integration in IDE, becomes extremely valuable.
  • When prototyping IR plugins, the Dalvik code targeted for deobfuscation is oftentimes contained in a single method. In such cases, it may be cumbersome or costly to decompile entire classes. To easily decompile a single method in the GUI, do Decompile with Options (Action menu or right-click), and untick “Decompile top level container class”:

With this option disabled, when your caret is positioned on a method, issuing a decompilation request will only decompile the target method, and nothing else (not even inner classes/methods of the target will be decompiled.)

  • Using the previous technique, the generated decompiled view represents an AST IJavaMethod — not the usual IJavaClass. Fully-qualified names are used to represent types, since import statements are not specified. An added value to the views associated with such units lies in the “IR-CFG” fragment, representing the final (most refined) IR before the AST generation phase kicked in:
Final IR viewed in the source unit for an IJavaMethod
  • Many IR utility routines are located in the DUtil class. Generally, explore the ir/ package’s javadoc, you will find plenty useful information in there.
  • We haven’t talked about accessing and using the emulator and sandbox. The main interface is IDState, and we will detail some of its functionality in a later post. In the meantime, you will find sample code on our GitHub repo.

That’s it for now – Have fun crafting your own IR plugins. As usual, reach us on Twitter’s @jebdec, Slack’s jebdecompiler, or privately over email. Until next time! – Nicolas

  1. dexdec is JEB’s Dex/Dalvik decompiler; gendec is JEB’s generic decompiler for all other architectures (x86, arm, etc.).
  2. Note that gendec has been allowing that for quite some time; its IR is different than dexdec‘s IR though.

JEB 4.1 is available

JEB version 4 has been in the making for over a year and half! The Beta has been available for all users for the past 5 months. Thank you for the feedback that many of you provided, it helped iron things out. The list below is a non-exhaustive changelog of additions since JEB 3.28.2.

Finally, on a related note: JEB 4.2 is around the corner already, with significant API additions that allow more control over the dex decompilation pipeline, in particular support for dexdec IR (Intermediate Representation) plugins. They unlock the possibility to write IR optimizers to thwart complex obfuscations. Stay tuned!

Now for a list of 3.28-to-4.1 changes:

Core changes (high-level)

  • gendec: JEB’s generic decompiler for all architectures but dex (i.e. x86, arm, mips, ethereum, wasm, etc.) received many important upgrades, and was one of the major focus for v4
  • dexdec: JEB’s dex/dalvik decompiler received important additions, most notably an emulator coupled with a custom sandbox that allows the generic auto-decryption and deobfuscation of data and code
  • native code analysis: upgrades, incl. performance, more analysis options, better switch recognition, tail-calls detection, etc.
  • debuggers: updates and support for dynamic addition of native code units
  • siglibs (library code recognition): updates for Android NDK, MSVC
  • ‘codeless’ siglibs: see blog post; added codeless signatures for OpenSSL and libcurl
  • typelibs (type libraries): updates
  • x86: added MSVC exception parsing
  • decompiler API: many additions for scripts/plugins to perform finer-grained decompilations
  • dex: context-information database to specify context-sensitivity and side-effects for methods, to allow better optimizations
  • dex: support for method and class moving (to classes/methods), for ex. allowing the creation of anonymous classes (previously, only class-to-package moving was supported)
  • dex: better obfuscated enum reconstruction
  • dex API: additions
  • comment manager: support inline, header (i.e. pre, above) and meta comments
  • JDB2 databases: upgraded the serialization process, now reliable on very large projects (previously could trigger OOM errors if -Xss had a low value)
  • Miscellaneous performance improvements, fixes and tweaks

Specific to gendec

  • x86/x64: decompiler plugin upgrades (incl. support for x87, mmx, sse, and supplementary ISAs)
  • arm/aarch64: decompiler plugin upgrades (incl. more opcodes and additional ISAs)
  • mips/mips64: decompiler plugin upgrades (incl. more opcodes and additional ISAs)
  • evm (ethereum): decompiler plugin upgrades (incl. newest opcodes, precompiled contracts 5-8, more routine hashes, etc.)
  • wasm: decompiler plugin upgrades (incl. floating-point instructions, br_table, conversion instructions, etc.)
  • added more IR optimizers
  • Pseudo-class recognition and reconstruction
  • API additions
  • IR pattern matching and replacing; IR compiler; IR emulator

Specific to dexdec

  • deobfuscator: virtualization deobfuscation (limited; see blog post)
  • deobfuscator: added Control-Flow flattening deobfuscation (limited)

UI Client changes

  • Omnibox for project-wide searches (F3)
  • Python script manager, script editor (F2)
  • Smart auto-completion, history-assisted text fields in most dialogs
  • Better Options/Configuration panels, for back-end and front-end properties
  • Support for Favorites/Bookmarks (F12)
  • Text line highlights (Ctrl/Command + M)
  • Code comments: support for inline comment, header comments, meta comments
  • Many more native code analysis widgets (see Native menu)
  • Android analysis: added easy dex merger widget; added customizable context-information database
  • Simpler Export facility (File, Export menu)
  • Explorer-like view for Folder units (with thumbnails)
  • Updated the Dark theme and Light theme
  • Progress indicator during save to/load from JDB2
  • API additions, incl. the Graphing API (see GraphDemo?.py sample scripts)
  • Better code replacement option (Action, Replace)
  • Debugger: register live code unit addition, during a debugging session
  • Table/tree filter: reverse matching with TILDA-prefixed filter string
  • Tip of the day (Help menu)
  • Floating builds: allow different controllers (different licenses) to run on the same machine
  • Non-floating builds: improved the update process, allow different licenses to run on the same machine smoothly
  • Miscellaneous performance improvements, fixes and tweaks
  • The client can now run on arm64 platforms (macOS and Linux/GTK)
  • Java version: minimum Java 8; recommended Java 11+

Download information:

Thanks & have a great summer!