Reversing Simatic S7 PLC Programs

This article is a guide to reverse engineer Simatic S7 PLC program blocks. 1

Last revision: May 10 2022.

Introduction

PLC (Programmable Logic Controllers) are specialized computers designed to control industrial systems having real-time processing requirements. They take inputs provided by sensors and generate outputs for actuators. As programmable devices, they execute user-provided software and therefore are susceptible to some classes of software attacks. The most publicized demonstration of that was made by the Stuxnet malware, whose end-goal was to take control, damage, and destroy arrays of centrifuges in a uranium enrichment plant. The analysis of the malicious PLC payload proved to be a long and tedious road 2, and up to this day, tooling and knowledge related to those systems remain limited relative to broadly-known architectures such as x86 or arm.

We attempt to bridge some of this gap by providing S7 analysis modules for JEB Pro. This article shows how they can be used to acquire, analyze, disassemble and decompile PLC program blocks intended to run on Siemens Simatic S7-300 and S7-400 devices, a very popular line of PLC used to operate industrial processes.

Terminology

Throughout the rest of this document, the terms PLC, S7 or S7 PLC are used interchangeably to refer to S7-300 or S7-400 PLC devices. Newer devices in the S7 product line, namely the S7-1200 and S7-1500, are not supported by this JEB extension and won’t be considered here.

Models of Simatic S7-300 (left side) and Simatic S7-400 (right side) – Image (c) Siemens

The official IDE used to program S7 PLC is called Step 7. Step 7 may be used as-is or as a part of the larger software suite Totally Integrated Automation (TIA).

A PLC program is made of blocks, such as data blocks, function blocks, and organization blocks. In this document, the term program may be understood as (collection of) blocks.

A program is downloaded to a PLC from a Programming Station, that is, a Windows-based computer running the Step 7 editor. When a program is retrieved from a PLC, it is uploaded to the programming station.

The assembly language STL (Statements List) and its bytecode counterpart, MC7, are sometimes used interchangeably.

Finally, the names Simatic, Step 7, and Totally Integrated Automation are trademarks of Siemens AG (“Siemens”).

Primer on S7

This section briefly presents what S7 programs are, their structure, as well as lower level details important to know from a reverse engineering perspective.

Programming Environment

S7 PLC are programmed using Step 7 or TIA’s Step 7 (TIA is a platform required to program the most recent S7 devices), the IDE running on a Windows computer referred to as the Programming Device. Once the program is written, it can be downloaded onto a physical PLC or a simulator program (such as PLCSIM, part of Step 7).

Blocks

A PLC program is a collection of blocks. Blocks have a type (data, code, etc.) and a number.

  • Data blocks:
    • User data blocks are referred to as DB if they are shared by all code, or DI if they belong to a code block
    • System data blocks are named SDB
  • Code blocks, also called logic blocks:
    • Organization Blocks (OB) are program entry points, called by the firmware
      • The principal OB is OB1, the program’s main entry point. It is executed repeatedly by the firmware.
      • Other OB can be programmed and called when interruptions happen, exceptions occur, timers go off, etc.
    • Function blocks (FB) and System Function blocks (SFB) are routines operating on a provided data block, called the instance data block (DI)
    • Function (FC) and System Functions (SFC) are routines that do not require a data block to operate

The distinction between FB and FC is subtle. Any FB could be written to perform equivalently as an FC, and vice versa. They exist as an easy way to distinguish between a function working as-is, like a C routine would (FC), and a function working on a collection of pseudo-encapsulated attributes, like a C++ class method would (FB).

A sample program consisting of two OBs, one FB and its associated DB. Two system functions are used.
A larger program designed to run on a S7-300 CPU 314.

There are various ways to write PLC code. Programmers may choose to write ladder diagrams (LAD) or function block diagrams (FBD); complex processes may be better expressed in statements list (STL) or in a high-level Pascal-like language (SCL). Regardless of source languages, the program is compiled to MC7 bytecode, whose specifications are not public.

A piece of MC7 bytecode is packaged in a block, along with some metadata (authoring information, flags, etc.) and the interface of the block. The interface of a data block is the block definition itself, a structure type. The interface of a logic block is its set of inputs, outputs, local variables, as well as static variables in the case of a FB, or return value in the case of a FC.

Example code of a Function Block (FB1000) programmed in STL and an associated data block DB1000. Note that both blocks share the same interface (IN/OUT/IN_OUT/STAT(=static)) data. The TEMP data section of the FB holds transient locals.

MC7 Code

PLC may be programmed using a variety of methods, such as:

  • Ladder logic (LAD)
  • Function block diagrams (FBD)
  • Assembly-like statement list (STL)
  • Structured control language (SCL, a high-level Pascal-like language)
  • Other methods exist

Step 7 compiles all source codes to MC7 bytecode, a representation that will be translated and executed by a virtual machine running on the PLC.

STL was relatively well-documented up until the S7-400 3. However, the binary specifications are not public at the time of writing. 4

The MC7 instructions map STL statements, with several notable exceptions (e.g. STL’s CALL is translated to UC/CC with additional code to prepare the Address Register pointer, opened Data Block, set up parameters on the Locals memory area in the case of FC/SFC call, etc.).

Execution Environment

The execution environment for MC7 bytecode is the following:

  • Memory areas:
    • Digital input, called I (0 to 65536 addressable bytes)
    • Digital output, called Q (0 to 65536 addressable bytes)
    • Global memory, called M (0 to 65536 addressable bytes)
    • Local memory, called L (0 to 65536 addressable bytes)
      • A special area V references the local memory of the caller method, i.e. if function f1 calls function f2, V in f2 is L of f1
    • Shared data block bytes via the DB1 register, called DB
    • Instance data block bytes via the DB2 register, called DI
    • Timers, called T (256 addressable 16-bit timers)
    • Counters, called C (256 addressable 16-bit counters)
  • Registers:
    • A program counter PC, not directly accessible
      • The PC is modified by intra-routine branching instructions (JU/JL/JC/…)
    • A 16-bit Status Word register (only the 9 lower bits are used), from #0 to #8:
      • FC: First-Check: if 0, indicates that the boolean instruction to be executed is the first in a sequence of logic operations to be performed (“logic operation string”)
      • RLO: Result of Logic Operation: holds the result of the last executed bit logic operation
      • STA: Status: value of the current boolean address
      • OR: Determine how binary-and and binary-or are combined
      • OS: Overflow Stored: copy of the OV bit
      • OV: Overflow: set by integer/floating-point instruction on overflow
      • CC0/CC1: Condition Codes: updated by arithmetic instructions and comparison instructions (see arithmetic and branching instructions for details on how CC0/CC1 are set and used)
      • BR: Binary Result: can be used to store the RLO (via SAVE); is used by system functions (SFC/SFB) as a success(1)/error(0) indicator
    • Two 32-bit address registers (AR1/AR2)
      • The address register hold a MC7 4-byte pointer (see section on MC7 Types). The area part of the pointer may be ignored (for area-internal access), or may be used (for area-crossing access)
    • Two or four 32-bit accumulators (ACCU1/ACCU2, ACCU3/ACCU4 optionally)
    • Two data block registers, not directly accessible
Translation in JEB

JEB’s MC7 plugin mirrors the execution environment, and adds several synthetic (artificial) registers to help with MC7 code representation and code translation to IR for the decompiler. The processor details can be examined in the GUI client (menu Native, handler Processor Registers).

Registers defined by the MC7 processor plugin

Instruction Set

Familiarity with STL is a topic that PLC reverse engineers will need to get familiar with. However, a complete and detailed guide to general STL programming is outside the scope of this document. Specific STL instructions will be discussed as need-be.

The instructions are grouped into the following categories:

  • bit logic: not/and/or/xor/and-not/or-not/xor-not, RLO access, etc.
  • word logic: and/or/xor on words
  • integer ops: add/sub/mul/div/mod, on 16- or 32-bit ints
  • shift/rotate: self-explanatory
  • floating ops: iee754 fp32 operations
  • comparison: compare and set CC0/CC1
  • conversion: int to float, float to int, signed extensions, etc.
  • data block: open data blocks as shared/instance, etc.
  • load/transfer: read and write the accus and address regs
  • accumulator: specific accumulators instructions
  • logic control: jumps, unconditional or CC0/CC1-based
  • program control: sub-routine calls to FB/FC/SFB/SFC
  • counter/timer: manipulate timers and counters

A summarized html version 5 of the reference STL documentation can be found on our website:

If you are looking for a quick reference on some opcode, this page may be more handy than the full reference manual.
Operands

Instructions carry 0 or 1 operand. The operand type can be one of the following:

  • Access to some area bytes or a direct immediate:
    L MB 300: load the global byte at address 300 (decimal) into ACCU1
    L L#1000: load the double-integer value 1000 into ACCU1
  • Indirect access, optionally using AR1/AR2:
    • Area-internal: the area is hardcoded in the instruction (below, I)
      = I [MD 100]: assign RLO to the input bit at X, where X is the pointer located at offset 100 of the global memory (M)
      X I [AR1, P#30.4]: binary-xor RLO with the input bit located at *(AR1+30.4)
    • Area-crossing: the target area is determined dynamically
      AN [AR1, P#10.0]: binary-and-not RLO with the bit located at *(AR1+10.0), the target area is specified in the MSB of AR1
      T QW [AR2, P#2.0]: transfer ACCU1L to the word located at *(AR2+2.0)
  • A bit operation:
    A I 2.0: binary-and RLO with the input bit 2.0 (bit #0 of byte 2)
    O Q 40.4: binary-or RLO with the output bit 40.4
  • A branching immediate, in word units:
    JU 15: jump to “instruction address + 2 *15”
  • Parameter access (for FC calls):
    T Z#6.0: transfer ACCU1 to the third parameter
  • Implicit operands, zero or one:
    NOP 0
    NOP 1
Types

Interestingly, some instructions encode the type of operand immediate (this allows for unambiguous STL code rendering). Below is a list of examples with the L instruction, which loads ACCU1 with an immediate value. Note that the immediates are encoded big-endian:

TYPE      INSTRUCTION          BYTECODE      IMM. (BE, 8- 16- or 32- bit)

bin32     L 2#10101010         300200aa      0x00aa

dec16     L 1000               300303e8      0x03e8
dec32     L L#1000000          3803000f4240  0x000f4240
hex8      L B#16#45            2845          0x45
hex16     L W#16#6677          30076677      0x6677
hex32     L DW#16#11223344     380711223344  0x11223344

float32   L 3.14               38014048f5c3  0x4048f5c3

char1     L 'z'                3005007a      0x007a
char2     L 'ab'               30056162      0x6162
char4     L 'abcd'             380561626364  0x61626364

bytes2    L B#(3, 6)           30060306      0x0306
bytes4    L B#(3, 6, 7, 8)     380603060708  0x03060708

bcd       L C#345              30080345      0x345

pointer   L P#100.2            380400000322  0x00000322 (area NOT specified)
pointer   L P#M 10000.0        380483013880  0x83013880 (area specified)

time      L T#10s31ms          38090000272f  0x0000272f
date      L D#2022-4-25        300a2e1a      0x2e1a
tod       L TOD##16:20:59.100  380b03821e5c  0x03821e5c
s5t       L S5T#1m40s          300c2100      0x2100

The types used in STL or MC7 are described in the next section.

Bit operations, RLO and FC

Newcomers to STL may be baffled by this type of code:

// assume a new routine
A I 0.0  // 1. binary-and
A I 0.1  // 2. binary-and
= Q 1.0  // 3. assign the result (in RLO) to output bit 1.0

If "A <SRC>" means "RLO = RLO & <SRC>", what does line (1) do, and does it depend on the value of RLO at (1)? The general case answer is no. A more precise translation of A would be:

if FC == 0:
  RLO = SRC
  FC = 1
else:
  RLO = RLO & SRC

If the FC flag is false, RLO takes the value of the source bit. What is the value of FC then? At the beginning of a program, it is false (because the sub-routine dispatch instructions – such as UC – set it to 0). It is also set to false after an end-of-logic-string operation, such as = (assign the RLO to a destination).

Data and Interfaces

Every block, code or data, has an interface that defines…

  • for a data block: the structure of the data block itself
  • for a logic block: its parameters for invocation

FC Block Interface

The interface of an FC block consists of at most 4 sections. The order matters.

  • IN: Input parameters
  • RET: single return value
  • IN_OUT: input/output parameters
  • OUT: output parameters (any number of returned values)

FB Block Interface

The interface of an FB block consists of at most 4 sections (they are not the same as FC’s though). The order matters as well, since it determines the memory layout of the associated DB.

  • IN: input parameters
  • OUT: output parameters
  • IN_OUT: input/output parameters
  • STATIC: the static data (held by the associated instance DB, and laid out right after the parameter data, that is, IN/OUT/IN_OUT)

Local Area

The interface of a logic block may also defines a TEMP area, holding temporary local variables (area L). Note that the local storage, just like any other storage, may be accessed without the need to be defined in an interface. Example:

L LB 3  ; load the byte at 0x3 in local storage into ACCU1
T QB 4  ; transfer ACCU1 to the output byte at 0x4

In practice, L-variables are going to be defined for most user-generated code. However, many synthetic statements generated by the compiler for behind-the-scene operations use L-variables that are located after what’s defined by the interface of a logic block.

The binary interfaces located in compiled blocks do not carry the names used when defining those interfaces.

Types

The variables defined in an interface belong to three general categories:

  • Elementary types: primitive types not exceeding 4 bytes (e.g. BYTE, WORD, INT)
  • Complex types: compound types (e.g. ARRAYs) and large types (e.g. DATE_AND_TIME)
  • Parameter types: block number, timer, counter, pointers or references
=> Elementary types: ("normal" types)
TYPE     BITSIZE  DESCRIPTION
BOOL           1  single bit stored on 1 byte
BYTE           8  unsigned integer
CHAR           8  ascii character
WORD          16  unsigned integer
INT           16  signed integer
DWORD         32  unsigned integer
DINT          32  signed integer
REAL          32  ieee-754 fp32 number
DATE          16  date (number of days since Jan 1 1990)
S5TIME        16  elapsed time in [0, 2h46m30s] (*)
TIME          32  elapsed time in ms, range +/- ~24d20h
TIME_OF_DAY   32  time of day in ms since midnight

=> Complex types: ("normal" types, continued)
TYPE     BITSIZE  DESCRIPTION
DATE_AND_TIME 64  timestamp (*)
STRING[n]     var strings, 16 to 2048 bits, n in [0,254] (*)
ARRAY         var N-dimensional arrays (*)
STRUCT        var structures

=> Parameter types: ("special" types, used in IN/OUT/IN_OUT sections)
TYPE     BITSIZE  DESCRIPTION
POINTER       48  pointers (*)
ANY           80  pointers with size (*)
TIMER         16  timer number
COUNTER       16  counter number
BLOCK_FB      16  FB number
BLOCK_FC      16  FC number
BLOCK_DB      16  DB number
BLOCK_SDB     16  SDB number

(*) details follow

JEB generates equivalent native types. They carry the same names and may be examined with the Type Editor in the GUI (menu Native, handler Type Editor).

Most types are self-explanatory. A few types require additional information.

S5TIME type

The S5TIME type is essentially a BCD (binary coded decimal) value ranging from 0 to 999 (in 1/10s), with a multiplier from 1 to 1000, stored on a word. The maximum value is therefore 9990 seconds, which is 2h46m30s.

Layout of a S5TIME object in memory – Image (c) Siemens AG

DATE_AND_TIME type

This type, also referred to as DT, holds a date/time value (similar to another type S7TIME (described later), although the S7TIME uses 6-byte instead of 8). It is limited to dates after Jan 1 1984. Each component of the DT is BCD-coded:

Byte     Value    Description
0        Year     90-99=>1990-1999, 00-89=>2000-2089
1        Month    1 to 12
2        Day      1 to 31
3        Hour     0 to 23
4        Minute   0 to 59
5        Second   0 to 59
6 (hi)   Millis2  0 to 9 (*100)
6 (lo)   Millis1  0 to 9 (*10)
7 (hi)   Millis0  0 to 9
7 (lo)   DoW      1 to 7 (1=Sunday)

Examples of encodings:

90010100 00000002: DT#1990-1-1:0:0:0.0
22031406 13281232: DT#2022-3-14-6:13:28.123

Array types

Array types of single- or multi-dimensional types whose element type may be any primitive of complex type, with the exception of ARRAY.

Note that it is common practice for PLC programmers to use non-zero based arrays, e.g. ARRAY[1 ..10, 1..20 ] of INT. The first element of this two-dimensional array would be [1,1]. Therefore the translated code to access an element [x,y] in memory is slightly more elaborate than RowLength*x+y, it would be RowLength*(x-1)+(y-1).

String types

The string types are fixed-length arrays of single-byte characters. They can hold from 0 to 254 characters. The layout in memory is as follows:

M L A(0) ... A(n-1)
where:
  M is a byte holding the maximum length
  L is the current string length (L <= M)
  A(i) are the string bytes

Example of a STRING[8]:
  08 05 41 41 41 41 41 00 00 00
would be the 5-char string 'AAAAA', which can accommodate up to 8 characters

The string types are STRING[0], STRING[1], STRING[2], …, STRING[254]. The STRING type is an alias for STRING[254].

Just like other complex types (arrays, structs, DT), string types are always 16-byte aligned in memory.

POINTER type

The pointer type (referred to as MC7 pointer in this document) is used to reference the address of a variable. It is 6-byte long, and made of two parts:

  • The WORD at 0 is a DB number if the data is stored in a data block (else it is 0), that is, the basic pointer (see below) references a DB/DI block
  • The DWORD at 2 is a 4-byte address (referred to as MC7 address)

A MC7 address has the following bit layout:

AAAAAAAA 00000BBB BBBBBBBB BBBBBXXX
where:
  A is the area code
  B the address in bytes [0,65535]
  X the bit position in [0,7] 

The area codes are as follows: (reference: S7.AreaType)

0x00: no area
0x81: I (digital input)
0x82: Q (digital output)
0x83: M (global memory)
0x84: DB (shared DB)
0x85: DI (instance DB)
0x86: L (local data, i.e. the stack)
0x87: V (previous local data, i.e. the caller's stack)

The diagram below summarizes the memory layout of a POINTER type.

Layout of a 6-byte POINTER object in memory – Image (c) Siemens

The JEB native types associated with MC7 pointer types are:

  • For the 6-byte MC7 pointer type (full structure): the associated JEB native types for such objects are named MC7PTR_xxx
  • For the 4-byte MC7 address types: the associated JEB native types for such objects are named MC7P_xxx

Examples of encodings:

P# 100.0      :      00000320 (MC7 address)
P#M 100.0     : 0000 83000320 (MC7 pointer)
P#I 0.7       : 0000 81000007 (MC7 pointer)
P#V 1.0       : 0000 81000008 (MC7 pointer)
P#DB8.DBX10.2 : 0008 84000052 (MC7 pointer)

ANY type

ANY for common types

The ANY type, in its common form, is the combination of a pointer with a pointed non-special element type and a repetition count. It allows pointing an area of memory (including memory located in data blocks) with bounds, e.g. 7 DWORDs at memory address 100.0.

It is 10-byte long:

  • The first 4 bytes contain the pointed data type code and the repetition counter
  • The remaining 6 bytes are the POINTER bytes

Format of ANY for normal types:

10 CC RR RR, followed by a POINTER (see above)
where:
- C is the data type code (see below)
- R is the repetition count

The data type code may be one of: (refer to S7.DataType.getId())

0x01 BOOL
0x02 BYTE
0x03 CHAR
0x04 WORD
0x05 INT
0x06 DWORD
0x07 DINT
0x08 REAL
0x09 DATE
0x0A TIME_OF_DAY
0x0B TIME
0x0C S5TIME
0x0E DATE_AND_TIME
0x13 STRING

The diagram represents the ANY type layout for common types:

Layout of an ANY data type for common types – Image (c) Siemens

Examples of encodings:

P#M 50.0 BYTE 10        : 10 02 000A 0000 83000190
P#DB10.DBX10.0 S5TIME 5 : 10 0C 0005 000A 84000080
ANY for special types

The ANY type is also used to provide or receive “any” data type. It is not just a “pointer with a pointed size”. That means that special types like counters, timers, or block numbers, may be specified as well. In this case, the format of ANY is different:

Format of ANY for special types:
  0x10 CC 00 00 00 01 00 00 00 00 NN NN
where:
- CC is the data type code
  0x17 BLOCK_FB
  0x18 BLOCK_FC
  0x19 BLOCK_DB
  0x1A BLOCK_SDB
  0x1C COUNTER
  0x1D TIMER
- NN is the block/timer/counter number
- note that the repetition count is set to be 1
  a single item may be provided by this type format
- note that there is no offset, as they are N/A for the special types

The diagram below is another way to visualize the ANY type layout for special types:

Layout of an ANY data type for special types – Image (c) Siemens

Examples of encodings:

Passing FC9 to an ANY parameter : 10 18 0001 0000 00000009
Passing T2  to an ANY parameter : 10 1D 0001 0000 00000002

Reversing S7 Programs

JEB Pro can be used to reverse one or several PLC blocks making up a full program.

Binary blocks

Internally, Step 7 manipulates PLC blocks as binary blobs whose formats are officially undocumented. At least two formats appear to exist:

  1. Binary blocks used by Step 7 internal primitives, which exist inside the Step 7 program memory.
  2. Binary blocks encoded in network packets, used when uploading or downloading blocks from/to the PLC.

Both formats are supported by JEB (reference: interface IS7Block). Below is their binary specifications. Note the following:

  • Some parts may be unknown or incorrect (noted ‘?’)
  • Bytes are 8-bit, words are 16-bit, dwords are 32-bit long.
  • The s7time type uses 6 bytes and is encoded as follows:
AA AA AA AA BB BB
where:
  B: big-endian WORD, number of days since Jan 1 1984
  A: big-endian DWORD, number of milliseconds in the days
     (range: 0 to 86400000)
example:
  00 00 EA 60 00 01 represents the timestamp Jan 2 1984 00:01:00.000

Format 1 (internal, LE)

The header is 0x4E bytes in length. There is no trailer. Integers are encoded little-endian.

The JEB native type for this type is S7_BLOCK1_HEADER.

offset      type      description
00          word      source language id (see S7.LangType)
02          word      block type id (see S7.BlockType)
04          word      block number
06          word      format and/or version (?)
08          dword     total block size (=0x4E+S1+S2+S3)
0C          dword     S1= payload size in bytes (*)
10          dword     S2= interface size in bytes
14          dword     S3= ? size in bytes
18          word      ?
1A          s7time    last modification of the block
20          s7time    last modification of the interface
26          dword     key
2A          char[8]   author name
32          char[8]   family name
3A          char[8]   block name
42          byte      block version (major.minor)
43          byte      ?
44          word      crc
46          word      ?
48          word      ?
4A          word      ?
4C          word      ?
4E          byte[S1]  payload
4E+S1       byte[S2]  interface
4E+S1+S2    byte[S3]  ?
4E+S1+S2+S3 -

The payload is:

  • For a logic block: the MC7 code
  • For a data block: the current (stored) data bytes

Format 2 (network, BE)

Both header and trailers are 0x24 bytes in length. Integers are encoded big-endian.

The equivalent JEB native types are S7_BLOCK2_HEADER and S7_BLOCK2_TRAILER.

offset      type      description
00          word      magic ('pp')
02          byte      source language id (see S7.LangType)
03          byte      block type id (see S7.BlockType)
04          word      block number
08          dword     total block size
0C          dword     key
10          s7time    last modification of the block
16          s7time    last modification of the interface
1C          word      interface size in bytes
1E          word      ? length
20          word      ? length
22          word      payload size in bytes
24          byte[]    payload bytes
24+S1       byte[]    interface bytes
24+S1+S2    -         trailer, see below

The trailer is defined as:

offset      type      description
00          char[8]   author name
08          char[8]   family name
10          char[8]   block name
18          byte      block version (major.minor)
19          byte      ?
1A          word      crc
1C          word      ?
1E          word      ?
20          word      ?
22          word      ?
24          -

Block Acquisition

JEB can acquire blocks of type (1), living in the Step 7 editor program memory. Fire up the Step 7 editor, upload blocks in your Step 7 project, then start JEB, open the File menu, Acquire Simatic S7 Blocks handler.

Menu File, Handler Acquire Simatic S7 Blocks

The acquisition widget will show up. It will list binary blocks found in the Step 7 editor memory. You can save some or all of them as binary files or import them directly into a newly-created project.

S7 Block Acquisition widget showing the inspection results of a live instance of Step 7

Of course, PLC blocks may be collected by other third-party means, such as a network sniffer during upload/download, or by a memory scanner.

S7 Analysis Projects

To create a project, either acquire blocks (as described in the above section) or use the File/Open handler in the GUI client to load up a block or archive of blocks:

  • A single block file should have the .s7blk extension in order to be treated by JEB as a S7 PLC block.
  • A collection of blocks (the most likely scenario) should be placed in a zip archive having a .s7zip extension. All blocks inside the archive will be treated by the plugin.

IMPORTANT: To decompile a collection of blocks, zip them in an archive and rename it with “.s7zip” extension.

In this example, an archive (blocks.s7zip) containing 20 blocks was loaded into JEB.

A new project will display the following minimal node hierarchy:

  • The project node (top node)
  • The artifact node representing the input file (in the above example, blocks.s7zip)
  • The simatic_s7 container unit node (under the artifact), representing the virtual container for all blocks
  • The simatic_mc7 code unit node (under the container unit node), representing a machine-like view of the code and data, mapped in a unified virtual memory segment
  • Other unit nodes may be present, such as:
    • Interface definition text unit nodes for all blocks
    • A decompiler unit node under the simatic_mc7 image unit

Container Unit

The container unit, of type simatic_s7, holds the blocks, parses them and decides where their code and data will be mapped in the child unit of type simatic_mc7. Note that this way of processing blocks is not related to how blocks are processed by a PLC. It is simply the plugin’s way to organize the blocks into an entity that fits within JEB’s public interfaces and representation models of plugins adhering to the native code analysis framework.

The Segments view of a simatic_s7 unit, showing how block bytes will be mapped in a virtual memory object

As can be seen in the “Segments” view of the container unit:

  • The MC7 bytecode of code blocks (OB, FC, FB) are mapped in individual segments named .code_<BlockName> (where <BlockName> consists of the block type appended with the block number, e.g. DB1000, FC1100, OB85)
  • The payload bytes of data blocks (DB) are mapped in individual segments named .data_<BlockName>
  • The memory areas I, Q, G, C, and T are also mapped as separate segments, respectively named .globals, .inputs, .outputs, .counters, .timers

Optional segments .blk_<BlockName> holding the raw bytes of of PLC blocks may be created for informational purposes, but this option is disabled by default.

The base address used for mapping is 0x100000 (=BASE). In most cases, the MC7 codes will be found at address BASE+0x10. The data blocks will be mapped at BASE+0x10000, BASE+0x20000, etc. since a data block contains at most 65536 bytes of addressable bytes. Other segments (for M, I, Q, C, T areas) are also 0x1000-aligned and mapped after the data blocks.

Image Unit

The image unit, whose default name is “simatic_mc7 image”, owns a virtual memory object mapping the various segments described in the previous section. Those segments represent different parts of blocks (MC7 bytecode, data block bytes, memory areas, etc.).

Each segment is prefixed with block metadata information for convenience (names, timestamps, versions, etc.). Keep in my mind that most of this information is purely informative and should not be taken as-is: An attacker may manually edit block headers and change, for example, authorship information or timestamps.

In the example below, we can look at the MC7 code of FC2, who was mapped in a segment “.code_FC2”. Most of the code is standard STL code. Some instructions and idioms are not (e.g. UC FC, param-access instructions), they will be mentioned later.

Disassembled MC7 bytecode of a Function Block

The unified virtual memory also holds data block bytes. Below, one can see that DB888 was mapped at virtual address 0x10000 by the analyzer.

Data bytes of DB888

Parsing Options

When creating a new project, parsing options will be presented to the user.

S7 plugin parser options.

The currently available options are:

DisassembleCode: true to disassemble the code. Keep this option on unless code examination or decompilation is not necessary.

MapRawBlocksAtZero: true to map the raw bytes of blocks before mapping their payload (code or data). It may be useful to examine very specific bits not rendered as metadata in the various description strings present throughout the disassembly

A FC block (binary format 1, internal) whose raw bytes were mapped at address 0 in a segment “.blk_FC20”. Note that a type S7_BLOCK1_HEADER was applied to the data.

GenerateInterfaceDescriptionUnits: true to generate interface definition text units, false otherwise. The interface units are very useful to have a global look at the various fields that make up an interface, as well as (for data blocks), the default values and current values of those fields.

Example for a data block (DB 888):

1/2 – Part of the definition of DB 888. A notification indicates that the current bytes in the block differ from the default values. (See below)
2/2 – The actual values for the arrays at offsets 702, 1322, 2562.

MapActualBytesForDataBlocks: true to use the current (actual) bytes of a data block when mapping the block to VM, false to use the default values.

Actions and Navigations

Readers are encouraged to go through the JEB Manual6 pages related to Actions and Views to learn more about how to interact with the disassembly. Of particular interest, we recommend reviewing:

  • Cross-references and navigating references
  • Commenting, bookmarking
  • Renaming items, such as routines, labels
  • Viewing and creating types and prototypes
  • Checking calling conventions and processor registers for reference

Most actions offered by the GUI client are located in the Action and Native menu.

Most actions offered by the GUI client are located in the Action and Native menu.

MC7 Binary Interfaces

Processor internals

The S7 plugin uses two custom calling conventions:

  • __FC_CC for FC/SFC/OB blocks
  • __FB_CC for FB/SFB blocks

You may see their details by opening the Calling Convention Manager widget (in the Native menu)

Widget showing a custom calling conventions used by the S7 plugin, __FC_CC
Another custom calling conventions used by the S7 plugin, __FB_CC

To understand why two conventions area required to represent calls to sub-routines, we need to detail how sub-routine calls are implemented in MC7.

FC calls

The order of parameter indexing is important: IN, RET, OUT, IN_OUT.

Let’s assume FC 1001 with the following interface:

IN:
  0.0: WORD IN0
  2.0: DWORD IN1
RET:
  6.0: DWORD
  10.0: -

Note that this interface uses only primitives and does not have OUT or IN_OUT parameters.

In STL such an FC would be called, for example, like that:

  L     3000
  T     #tmp
  CALL  FC  1001
   IN0    :=#tmp              // symbolic ref to a variable on the stack
   IN1    :=DW#16#10002000    // literal immediate
   RET_VAL:=MD100             // address in memory for a return value

Which a compiler may translate to this piece of MC7 code:

FC 2001, calling FC 1001

Note the following:

  • The “call” was translated to a UC (unconditional call) and JU (unconditional jump)
  • The parameters are provided by reference, as raw DWORDs, just after the JU. The references are 4-byte MC7 addresses, whose structure was detailed in the previous section.

Reminder: MC7 address (4-byte): AAAAAAAA 00000XXX XXXXXXXX XXXXXBBB
where A is the area code, X the offset in bytes, B the bit position (0-7)

The area codes are as follows: (S7.AreaType)

  • I (digital input): 0x81
  • Q (digital output): 0x82
  • M (global memory): 0x83
  • DB (shared DB): 0x84
  • DI (instance DB): 0x85
  • L (local data, i.e. the stack): 0x86
  • V (previous local data, i.e. the caller’s stack): 0x87

With this laid out…

  • 0x87000000 can be translated as P#V 0.0, that is a reference to the first bytes/bits of the caller stack (the parameters are to be interpreted from the callee’s perspective). Indeed, the caller’s stack at 0 contains word 3000 (L 3000 / T LW 0).
  • 0x83000320 can be translated as P#M 100.0 (0x320=800), which matches what was assigned for RET_VAL in the original STL snippet.

Because of how the MC7 VM deals with locals, it is simpler for JEB to not treat those parameters as stack parameters. Instead, they are assigned to individual synthetic registers named PAR0, PAR1, PAR2, PARn (limited to 16 entries). Those registers can be seen in the calling convention definition for FC/SFC/OB, namely “__FC_CC”.

Let’s look at the code for FC 1001:

  L     #IN0
  L     #IN1
  +D    
  T     #RET_VAL

Which was compiled to:

MC7 code of routine FC 1001 (the callee)

First, note the signature and prototype assigned by JEB:

void __FC_CC func_FC1001(WORD*, DWORD*, DWORD*)

As said above, in this example, parameters were provided by reference. The order follows the interface definition’s: the first parameter matches the first IN; the second parameter matches the second IN; the last parameter matches RET_VAL

What about other parameter types? Are all of them provided by reference? The answer is no. Some parameters are provided by value (obviously, they must be IN parameters as well). Others are provided by references to pointers or references to any variables.

  • Primitives (BOOL, BYTE, CHAR, WORD, INT, DWORD, DINT, REAL, DATE, TIME_OF_DAY, TIME, S5TIME) are provided by reference, i.e. a 4-byte MC7 address.
  • The special types TIMER, COUNTER, BLOCK_FB, BLOCK_FC, BLOCK_DB, BLOCK_SDB (16-bit, IN only) are provided by value (16-bit, zero-padded to fit a 32-bit slot).
  • The complex types DATE_AND_TIME (8 bytes), STRING (up to 256 bytes), ARRAY and STRUCT are provided by reference to a pointer referencing the actual data. (Special types are generated, more on this below.)
  • POINTER (10 bytes) parameters are provided by reference (to the pointer parameter).
  • ANY (10 bytes) parameters are provided by reference (to the any parameter).
OB Prototypes

Note that OB blocks are always assigned the following prototype:

void __FC_CC func_OBx()

FB calls

FB (Function Blocks) mode of invocation is different. A DB is provided along with the call. The DB (referred to as the FB’s DI – that is, instance Data Block – in this context) will contain the call parameters (IN, OUT, IN_OUT), along with the rest of the block’s static data (referred to as STATIC).

The order is important: IN, OUT, IN_OUT, STATIC.

Let’s assume FB 1001 to have the following interface header (TEMP omitted):

IN:
  0.0: WORD x
  2.0: WORD y
OUT:
  4.0: WORD res
IN_OUT:
  6.0: WORD seed
STAT:
  8.0: DWORD
 12.0: BOOL

It is expected that the DB provided during a call have the same or a compatible interface. In this example, we will pass DB 1001.

In STL, the FB would be called like this:

CALL  FB  1001 , DB1001
   x     :=W#16#7
   y     :=W#16#8
   result:=MW10
   iv    :=MW14

The parameters will be copied into the provided block’s (DB 1001) actual slots. Compilation of this code:

.code_FB1:00000046              func_FB1003      proc
.code_FB1:00000046                               
.code_FB1:00000046 10 03                         BLD       3
.code_FB1:00000048 41 60 00 04                   =         L 4.0
.code_FB1:0000004C FB 7C                         CDB                  ;1
.code_FB1:0000004E FB 79 03 E9                   OPN       DI 1001    ;2
.code_FB1:00000052 FE 6F 00 00                   TAR2      LD 0       ;3
.code_FB1:00000056 30 03 00 07                   L         7          ;4
.code_FB1:0000005A 7E 56 00 00                   T         DIW 0      ;...
.code_FB1:0000005E 30 03 00 08                   L         8
.code_FB1:00000062 7E 56 00 02                   T         DIW 2
.code_FB1:00000066 12 0E                         L         MW 14
.code_FB1:00000068 7E 56 00 06                   T         DIW 6
.code_FB1:0000006C FE 0B 84 00+                  LAR2      P#DBX 0.0  ;5
.code_FB1:00000072 FB 72 03 E9                   UC        FB 1001    ;6
.code_FB1:00000076 FE 6B 00 00                   LAR2      LD 0       ;7
.code_FB1:0000007A 7E 52 00 04                   L         DIW 4      ;8
.code_FB1:0000007E 13 0A                         T         MW 10      ;...
.code_FB1:00000080 7E 52 00 06                   L         DIW 6
.code_FB1:00000084 13 0E                         T         MW 14
.code_FB1:00000086 FB 7C                         CDB                  ;9
.code_FB1:00000088 10 04                         BLD       4
.code_FB1:0000008A 65 00                         BE
.code_FB1:0000008A                               
.code_FB1:0000008A              func_FB1003      endp

Notes:

  1. The current DI (since the caller is itself an FB) is saved by being transferred to DB
  2. The to-be instance data block is opened
  3. AR2 is copied to LD0
  4. IN and IN_OUT parameters are copied to the instance DB
  5. AR2 is to offset 0 (N/A here, useful in the case of multi-instance data blocks; note that the attentive reader may have noticed that the pointer is loaded with an area DB! Why not DI? Well, the area will be disregarded by the client code in the callee routine, only the offset part of the pointer is used. )
  6. The call is translated to UC
  7. The caller’s AR2 is restored
  8. IN_OUT and OUT parameters are read and transferred to their final destination
  9. The DI that was in-use before the call is restored

Unlike an FC call, the parameters are located in the instance data block. The transfer does not involve the local stack.

The prototype of FB methods uses the __FB_CC convention:

void __FB_CC func_FB1003(_DATA_FB1003*, DWORD)

They use two parameters:

  • The first one is a pointer to the associated data block type. It is stored inside the register rDI.
  • The second one is an offset inside this data block. For single-instance data block (common case), that offset, held in the register AR2, is 0. For multi-instance data blocks, it may differ. Note that the decompiler plugin does not support multi-instance data blocks at the time of writing.

OB1 local data

The OB1 may be the most important block of your Simatic programs. While it adheres to the general structure of OB blocks (that is, a parameter-less version of FC blocks), OB1 has an important specificity to keep in mind: the first 20 (0x14) bytes of its local area is set up with important fields when the block is invoked.

off  type           name        description
 00  BYTE           EV_CLASS    event class (0x11= OB1 is active)
 01  BYTE           SCAN_1      scan type (*)
 02  BYTE           PRIORITY    priority class (?)
 03  BYTE           OB_NUMBER   OB number (1)
 04  BYTE           RESERVED_1  -
 05  BYTE           RESERVED_2  -
 06  INT            PREV_CYCLE  run time of previous cycle (ms)
 08  INT            MIN_CYCLE   min cycle time since last start-up
 0A  INT            MAX_CYCLE   max cycle time since last start-up
 0C  DATE_AND_TIME  DATE_TIME   OB calling timestamp

(*) scan types:
1: completion of a warm restart
2: completion of a hot restart
3: completion of the main cycle
4: completion of a cold restart
5: first OB1 cycle of the new master CPU
Refer to the reference documentation for more details on scan types.

You may see that by checking the interface of an OB1 block loaded in your analysis project. It is likely (although not necessary) that the interface TEMP data (locals) will start with 6 BYTEs, 3 INTs, and 1 DATE_AND_TIME fields.

The native structure used by JEB to represent this header is called OB1_HEADER. You may examine it using the native type editor widget (menu Native, Type Editor).

Interface of an OB1 block. Native type OB1_HEADER examined in the type editor widget.

Other OB blocks also receive parameters on their stack upon execution. Refer to the S7 programming manuals for details.

Idiomatic Constructs

N-way branching

The way N-way conditional branching is implemented in MC7 is via the JL instruction.

Example:

      L     MB 100  // load m[100] inside ACCU1LL (=x)
      JL    labx    // default target (x>=5)
      JU    lab0    // target if x==0
      JU    lab1    // target if x==1
      JU    lab2    // target if x==2
      JU    lab1    // target if x==3
      JU    lab2    // target if x==4
labx: L     1
      JU    next
lab0: L     W#16#10
      JU    next
lab1: L     W#16#100
      JU    next
lab2: L     W#16#1000
      JU    next
next: T     #RET_VAL

This would get decompiled as something like:

    ...
    switch(x) {
        case 0: {
            v0 = 0x10;
            break;
        }
        case 1: 
        case 3: {
            v0 = 0x100;
            break;
        }
        case 2: 
        case 4: {
            v0 = 0x1000;
            break;
        }
        default: {
            v0 = 1;
        }
    }
    ...

Decompiling MC7

The S7 decompiler plugin is a gendec 7 plugin. As such, the plugin adheres to the INativeDecompilerPlugin interface, and can itself be customized via INativeDecompilerExtension plugin extensions.

Decompilation works on per-function basis. Select the function, then hit the TAB key (or menu Action, handler Decompile).

The context-menu of a method provides basic actions.

The decompiler generates a child unit of type “c“. It is represented by the client as pseudo-C code rendered in a separate fragment. (See an example below.) The pseudo-code unit, just like the disassembly code, has a flexible output actionable via the Action and Native menus. If you position the caret on a line of code and press TAB again, you will be brought back to the closest corresponding MC7 code in the disassembly view, matching the pseudo-C code.

The decompiler does not decompile to SCL. The output is not meant to be recompilable. It is meant to provide a higher-level representation of complicated, verbose, MC7 code, markable and analyzable for reverse-engineering and analysis purposes.

Example decompilation of a block FC4

Special operators

The decompiler may create the following custom operations (underlying IR: IEOperation with a FunctionOptype):

  • ExtractOff(mc7_address) -> byte_offset: extract the offset from a 4-byte MC7 address. This is equivalent to “addr >> 3) & 0xFFFF”
  • ExtractBit(mc7_address) -> bit_position: extract bit from a 4-byte MC7 address. This is equivalent to “addr & 7”
  • ToNP(mc7_address) -> native_address: convert a 4-byte MC7 address to a native VM address
  • ToMC7P(native_address) -> mc7_address: convert a 32-bit native address to a MC7 address
  • ToMC7PPTR(native_address) -> mc7_address: convert a 32-bit native address to a MC7 address referring to a MC7 pointer
  • FPOP(fpval) -> result: the following floating point operations: FPOP= SQR, SQRT, EXP, LN, SIN, COS, TAN, ASIN, ACOS, ATAN.
  • IntToBCD(int_value) -> bcd_value: convert an integer to a binary-coded decimal value
  • ReadTimer(timer_number) -> value
  • ReadCounter(counter_number) -> value
  • GetDBAddress(db_number) -> native_address
    • along with GetOBAddress, GetFBAddress, GetFCAddress, GetSFBAddress, GetSFCAddress
  • GetDBLength(db_number) -> block size
  • BitAddr(byte_offset, bit_position) -> pointer: a native pointer not referencing a byte (i.e. bit_position != 0)

Gotchas

FC conversions and invocations

As a reminder, for FC blocks, the prototypes should be converted to:

  • for special type arguments (block, timer, counter): by value
  • for primitives type arguments: by reference: MC7 address to the actual data
  • for POINTER/ANY arguments: by reference: MC7 address to the actual data
  • for complex types: by double-reference: MC7 address to MC7 pointer to the actual data

However, when generating native prototypes for FC blocks, the converter does not do that for primitive type arguments: the generated prototype uses native reference types instead of MC7 opaque references.
e.g. a function (WORD,TIMER,STRING)
will have its native prototype set to (WORD*,WORD,MC7P_MC7PTR_STRING)
instead of (MC7P_WORD,WORD,MC7P_MC7PTR_STRING)

As for invocations: instead of rendering opaque MC7 references, such func1(0x87000010, 0x84001000), the decompiler will attempt to replace them by native references wrapped in ToMC7P or ToMC7PPTR operators, e.g.
func1(ToMC7P(&varY), ToMC7P(&varZ))

Limitations

Below is a list of limitations, at the time of writing. Some limitations will disappear as the decompiler matures.

  • Some data types are not properly rendered by the AST component, e.g. time and date types. Most would be rendered as regular integers instead of being interpreted and rendered as pseudo strings.
  • The decompiler does not support multi-instance data blocks.
  • Nested bit operations, such as A(, O(, ), etc. are currently not translated and will fail a decompilation
  • The CPU is assumed to have 2 accumulators, not 4.
  • MCR (master-control relay) is disregarded.
  • The decompiler may fail converting MC7 pointers to native pointers (referencing the virtual memory).
  • Some stack variables, representing L-variables, may subsist and appear to clutter a decompilation output. The reason is that called FC’s have access to the stack of their caller (V area), and establishing guarantee that that area is accessed as intended is very hard to establish. Unsafe optimizers may clear variables when they are deemed unused; however, in the general case, many locals should stay in place.

Generally, decompilation of MC7 code presents challenges stemming from the execution environment of MC7 and the design of the MC7 virtual machine itself: multiple memory areas (no unified VM), unorthodox pointer structures, etc. While gendec deals with those constructs in a generic way and attempts to generate pseudo-C code best representing them, it will not succeed in producing the best or most readable code in many scenarios. Such issues will be ironed out by incremental upgrades. Power-users should also keep in mind that JEB offers an expansive API allowing them to craft all sorts of extensions, including decompiler IR optimizers or AST massagers.

Library functions

While SFC and SFB blocks are reserved for system uses, the common convention is to reserve the low ranges of FC/FB block numbers for library code not classified as system code, such as utility routines whose interfaces were standardized by the IEC (International Electrotechnical Commission).

For a number of reasons, it may be inconvenient or impossible to include those blocks in your JEB project. Consequently, how would a call to a library FC or a system FC be rendered, since their prototype is theoretically unknown? While gendec has several way to recover prototypes by heuristics, the S7 extension also ships with a database of library block types and numbers with their common name and interface.

Example: if a call to FC 9 is found, but no FC 9 exists in the project, the block library will be checked for a match. In this case, the block will be understood as being “EQ_DT”. Refer to the S7 system reference manuals for details on well-known library and system blocks.

Public API

Users may craft extensions, such as scripts and plugins, in Java or Python. The reference documentation for JEB public API is located at https://www.pnfsoftware.com/jeb/apidoc.

The public types (classes, interfaces, unions) specifically exposed by the S7 analysis modules are located in the com.pnfsoftware.jeb.core.units.code.simatic package.

A course to the general JEB API is outside the scope of this manual. Users are encouraged to visit
https://www.pnfsoftware.com/jeb/manual/dev/introducing-jeb-extensions/
https://github.com/pnfsoftware/jeb-samplecode/tree/master/scripts
as well as this blog to learn more on this topic.

Conclusion

This document’s original purpose was to be a usage manual for JEB S7 block analysis extensions.

It grew into a full-blown introduction to Simatic S7 PLC reverse engineering. While the first half is mostly tool-agnostic, the second half demonstrates how JEB can be used to speed up the analysis of S7-300/S7-400 PLC programs, from block acquisition to block analysis and code disassembly, interface recovery, and of course, decompilation.

This first draft will be updated and augmented in the future, as the extensions mature. Thank you for reading, and a big thank you to our users for your continued support!

Nicolas Falliere (nico at pnfsoftware dot com)
Twitter @jebdec, Slack @jebdecompiler

  1. The S7 analysis modules (https://www.pnfsoftware.com/jeb/plc) ship with JEB Pro, and are also available with JEB Demo, the trial version of JEB Pro.
  2. An analysis of the Stuxnet infection code targeting S7-300 devices by this author can be found in the Symantec paper (archived at https://www.pnfsoftware.com/other/w32_stuxnet_dossier.pdf)
  3. “Simatic – Statement List (STL) for S7-300 and S7-400 Programming – Function Manual” (archived at https://www.pnfsoftware.com/other/s7_400_stl_rev2017.pdf)
  4. The MC7 disassembler in JEB was implemented by analyzing the code generated by the Step 7 compiler.
  5. STL instruction set quick HTML reference (https://www.pnfsoftware.com/jeb/simatic-s7-stl-opcodes.html)
  6. JEB Manual pages for Actions (https://www.pnfsoftware.com/jeb/manual/actions/)
  7. gendec is JEB’s generic decompilation engine

Writing dexdec IR optimizer plugins

Starting with JEB 4.2, users have the ability to instruct dexdec1 to load external Intermediate Representation (IR) optimizer plugins. 2

From a very high-level perspective, a Dex method scheduled for decompilation goes through the following processing pipeline:

  1. Dalvik method converted to low-level IR
  2. SSA transformation and Typing
  3. IR optimizations
  4. Final high-level IR converted to AST
  5. AST optimizations
  6. Final clean AST rendered as pseudo-Java code (NOTE: access to the AST is already possible via JEB’s Java AST API)

Phase 3 consists of repeatedly calling IR processors, that essentially take an input IR and transform it into another, further refined IR (that process is called “lifting”). IR processors range from junk code cleaner, to variable propagation, immediate propagation, constant folding, higher-level construct rebuilding, compound predicate rebuilding, code restructuring, to all sort of obfuscation removal, advanced optimizers that may involve emulation, dynamic or symbolic execution, etc.

By working at this level, power-users have the ability to write custom deobfuscators, that we may not be able to deliver as JEB built-ins for a variety of reasons (e.g. obfuscation specific to a single group of files, custom protection to files under NDA, etc.).

Sample dexdec IR script plugin applying custom deobfuscation to recover strings on a protected sample

A sample dexdec IR plugin

dexdec IR plugins are JEB back-end plugins (not front-end scripts). Therefore, they are to be dropped in the coreplugins folder (or coreplugins/scripts for plugin scripts). They can be written as:

  • Precompiled jar files: the source language can be anything that compiles to Java bytecode; those plugins cannot be hot-swapped, and therefore are not ideal for prototyping/experimenting; they are great for mature plugins though.
  • Java plugin scripts: single Java source files. Strong typing and IDE integration (e.g. with Eclipse or IntelliJ) with Javadoc makes it ideal for developing complex plugins. Hot reload is supported. (They can be seamlessly modified while JEB is running, making them great for prototyping.)
  • Python plugin scripts: written in 2.7 syntax. Hot reload is supported. Restriction: unlike other plugins, an instance of a Python script plugin may be shared by multiple decompilation threads. Therefore, they must be thread-safe and support concurrency.

In this blog, we will show how to write a Python plugin script. Users familiar with JEB client scripting will be in familiar territory.

IMPORTANT! Note that loading such plugins is not enabled by default in JEB. Add the following line to your bin/jeb-engines.cfg file to enable loading Python plugins: .LoadPythonPlugins = true

dexdec ir plugins must implement the IDOptimizer interface. In practice, it is highly recommended to extend the implementing class AbstractDOptimizer, like this:

from com.pnfsoftware.jeb.core.units.code.android.ir import AbstractDOptimizer

# sample IR plugin, does nothing but log the IR CFG
class DOptSamplePython(AbstractDOptimizer):

  # perform() returns the number of optimizations performed
  def perform(self):
    self.logger.info('MARKER - Input IR-CFG: %s', self.cfg)
    return 0

IMPORTANT! All dexdec IR public interfaces and types are located in the com.pnfsoftware.jeb.core.units.code.android.ir package. Keep a tab opened on this page while you develop IR plugins!

The skeleton above:

  • must have the same filename as the plugin class, therefore DOptSamplePython.py
  • must be dropped in coreplugins/scripts/
  • requires Python script plugins to be enabled in your engines configuration

If you haven’t done so, start JEB. Your plugin should appear in the list of dexdec plugins. Check the Android menu, Decompiler Plugins handler:

A list of external Dex decompiler plugins

Now load a dex/apk, and decompile any class. Your plugin will eventually be called. The logger view should attest to that by displaying multiple “MARKER – Input IR-CFG: …” lines.

dexdec Intermediate Representation

dexdec‘s IR consists of IDElement objects. Every IR statement is an IDInstruction, itself an IDElement. (All those types and their attributes are described in depth in the API doc.) When an IR plugin is called, it “receives” an IDMethodContext (representing a decompiled method), stored in the optimizer’s ctx public field. The IR CFG, a control flow graph consisting of IR statements, can be retrieved via ctx.getCfg(). It is also stored in the cfg public field, for convenience. A formatted IR CFG may look like this:

0000/2+  !onCreate(v4<com.pnfsoftware.raasta.AppHelp>, v5<android.os.Bundle>)<void>                            
0002/2:  !requestWindowFeature(v4<com.pnfsoftware.raasta.AppHelp>, 1)<boolean>                                 
0004/3:  !setContentView(v4<com.pnfsoftware.raasta.AppHelp>, 7F030000)<void>                                   
0007/5:  !x4<android.webkit.WebView> = ((android.webkit.WebView)findViewById(v4<com.pnfsoftware.raasta.AppHelp>, 7F070000)<android.view.View>)<android.webkit.WebView>  
000C/2:  !loadData(x4<android.webkit.WebView>, getString(v4<com.pnfsoftware.raasta.AppHelp>, 7F05005B)<java.lang.String>, "text/html", "utf-8")<void>  
000E/3:  !setBackgroundColor(x4<android.webkit.WebView>, 0)<void>                                              
0011/1:  !setDefaultTextEncodingName(getSettings(x4<android.webkit.WebView>)<android.webkit.WebSettings>, "utf-8")<void>  
0012/1:  return         

Statements (IDInstruction) can have any of the following opcodes (see DOpcodeType):
– IR_NOP: no-operation
– IR_ASSIGN: assignment
– IR_INVOKE: invocation (including new object and new array construction)
– IR_JUMP: unconditional jump
– IR_JCOND: conditional jump
– IR_SWITCH: switch statement
– IR_RETURN: return statement
– IR_THROW: throw statement
– IR_STORE_EXCEPTION: exception retrieval (special)
– IR_MONITOR_ENTER: VM monitor acquisition
– IR_MONITOR_EXIT: VM monitor release

Statement operands are themselves IDElements, usually IDExpressions. Examples: IDImm (immediate values), IDVar (variables), IDOperation (arithmetic/bitwise/cast operations), IDInvokeInfo (method invocation details), IDArrayElt (representing array elements), IDField (representing static or instance fields), etc. Refer to the hierarchy of IDElement for a complete list.

IR statements can be seen as recursive IR expression trees. They can be easily explored (visitXxx method()) and manipulated. They can be replaced by newly-created elements (see IDMethodContext.createXxx methods). Data-flow analysis can be performed on IR CFG, to retrieve use-def and def-use chains, and other variable liveness and reachability information (see cfg.doDataFlowAnalysis).

Use-case: cleaning useless Android calls

Let’s put this new API to practical, real-world use. First, some background: JEB ships with emulator-backed IR optimizers that attempt to auto-decrypt immediates such as strings. While this deobfuscator generally performs well on protected files, lately, we’ve received samples for which strings were not decrypted. The reason is quite straight-forward, see this example:

throw new java.lang.IllegalStateException(o.isUserRecoverableError.read(((char)android.text.TextUtils.getOffsetBefore("", 0)), 12 - java.lang.Long.compare(android.os.Process.getElapsedCpuTime(), 0L), (android.view.ViewConfiguration.getFadingEdgeLength() >> 16) + 798).intern());

In the above code (extracted from a protected method), read is a string decryptor. Alas, the presence of calls such as:

  • TextUtils.getOffsetBefore(“”, 0))
  • Long.compare(Process.getElapsedCpuTime(), 0L)
  • ViewConfiguration.getFadingEdgeLength() >> 16

prevent the generic decryptor from kicking in. Indeed, what would an emulator be supposed to make with those calls to external APIs, whose result is likely to be context-dependent? In practice though, they could be resolved by some ad-hoc optimizations:

  • getOffsetBefore() algorithm is (almost) straightforward
  • getElapsedCpuTime() also returns strictly positive results, making compare() operation predictable
  • getFadingEdgeLength() returns small ints, less than 0x10000

We will craft the following IR optimizer: (file RemoveDummyAndroidApiCalls.py)

from com.pnfsoftware.jeb.core.units.code.android.ir import AbstractDOptimizer, IDVisitor

class RemoveDummyAndroidApiCalls(AbstractDOptimizer):  # note that we extend AbstractDOptimizer for convenience, instead of implementing IDOptimizer from scratch
  def perform(self):
    # create our instruction visitor
    vis = AndroidUtilityVisitor(self.ctx)
    # visit all the instructions of the IR CFG
    for insn in self.cfg.instructions():
      insn.visitInstruction(vis)
    # return the count of replacements
    return vis.cnt

class AndroidUtilityVisitor(IDVisitor):
  def __init__(self, ctx):
    self.ctx = ctx
    self.cnt = 0

  def process(self, e, parent, results):
    repl = None

    if e.isCallInfo():
      sig = e.getMethodSignature()

      # TextUtils.getOffsetBefore("", 0)
      if sig == 'Landroid/text/TextUtils;->getOffsetBefore(Ljava/lang/CharSequence;I)I' and e.getArgument(0).isImm() and e.getArgument(1).isImm():
        buf = e.getArgument(0).getStringValue(self.ctx.getGlobalContext())
        val = e.getArgument(1).toLong()
        if buf == '' and val == 0:
          repl = self.ctx.getGlobalContext().createInt(0)

      # Long.compare(xxx, 0)
      elif sig == 'Ljava/lang/Long;->compare(JJ)I' and e.getArgument(1).isImm() and e.getArgument(1).asImm().isZeroEquivalent():
        val0 = None
        arg0 = e.getArgument(0)
        if arg0.isCallInfo():
          sig2 = arg0.getMethodSignature()
          if sig2 == 'Landroid/os/Process;->getElapsedCpuTime()J':
            # elapsed time always >0, value does not matter since we are comparing against 0
            val0 = 1
        if val0 != None:
          if val0 > 0:
            r = 1
          elif val0 < 0:
            r = -1
          else:
            r = 0
          repl = self.ctx.getGlobalContext().createInt(r)

      # ViewConfiguration.getFadingEdgeLength()
      elif sig == 'Landroid/view/ViewConfiguration;->getFadingEdgeLength()I':
        # always a small positive integer, normally set to FADING_EDGE_LENGTH (12)
        repl = self.ctx.getGlobalContext().createInt(12)

    if repl != None and parent.replaceSubExpression(e, repl):
      # success (this visitor is pre-order, we need to report the replaced node)
      results.setReplacedNode(repl)
      self.cnt += 1

What does this code do:
– First, it enumerates and visits all CFG instructions.
– The visitor checks for IDCallInfo IR expressions matching the kinds of Android framework API calls described above: getOffsetBefore(), compare(getElapsedCpuTime(), 0), getFadingEdgeLength()
– It evaluates and calculates the results, and replaces IR call expressions (IDInvokeInfo) by newly-created constants (IDImm).

The resulting IR, which the plugin could print, would look like:

throw new java.lang.IllegalStateException(o.isUserRecoverableError.read(((char)0, 12 - 1, 0 + 798).intern());

Subsequently, other optimizers, built into dexdec, can kick in, clean the code further (e.g. fold constants), and make the read() invocation a candidate for string auto-decryption, yielding the following result:

Our external IR plugin is enabled. The IR can be cleaned, the auto-decryption takes place.

Done!

The sample script can be found in your coreplugins/scripts folder. Feel free to extend it further.

Tips

  • dexdec IR plugins can also be written as Java source. Have a look at the sample file DOptSampleJava.java, located in coreplugins/scripts. As a plugin grows in size and complexity, working with a strongly-typed language like Java, coupled with excellent javadoc integration in IDE, becomes extremely valuable.
  • When prototyping IR plugins, the Dalvik code targeted for deobfuscation is oftentimes contained in a single method. In such cases, it may be cumbersome or costly to decompile entire classes. To easily decompile a single method in the GUI, do Decompile with Options (Action menu or right-click), and untick “Decompile top level container class”:

With this option disabled, when your caret is positioned on a method, issuing a decompilation request will only decompile the target method, and nothing else (not even inner classes/methods of the target will be decompiled.)

  • Using the previous technique, the generated decompiled view represents an AST IJavaMethod — not the usual IJavaClass. Fully-qualified names are used to represent types, since import statements are not specified. An added value to the views associated with such units lies in the “IR-CFG” fragment, representing the final (most refined) IR before the AST generation phase kicked in:
Final IR viewed in the source unit for an IJavaMethod
  • Many IR utility routines are located in the DUtil class. Generally, explore the ir/ package’s javadoc, you will find plenty useful information in there.
  • We haven’t talked about accessing and using the emulator and sandbox. The main interface is IDState, and we will detail some of its functionality in a later post. In the meantime, you will find sample code on our GitHub repo.

That’s it for now – Have fun crafting your own IR plugins. As usual, reach us on Twitter’s @jebdec, Slack’s jebdecompiler, or privately over email. Until next time! – Nicolas

  1. dexdec is JEB’s Dex/Dalvik decompiler; gendec is JEB’s generic decompiler for all other architectures (x86, arm, etc.).
  2. Note that gendec has been allowing that for quite some time; its IR is different than dexdec‘s IR though.

JEB’s GENDEC IR Emulation for Auto-Decryption of Data Items

Under some circumstances, JEB’s generic decompiler is able to detect inline decryptors, and subsequently attempt to emulate the underlying IR to generate plaintext data items, both in the disassembly view and, most importantly, decompiled views.1

This feature is available starting with JEB 4.0.3-beta. It makes use of the IREmulator object, available in the public API for scripting and plugins.

Here’s an example of a protected elf file2 (aarch64) that was encountered a few months ago:

Disassembly of the target routine

GENDEC’s unsafe optimizers are enabled by default. Let’s disable them before performing a first decompilation, in order to see what the inline decryptor looks like.

To bring up the decompilation options on-demand, use CTRL+TAB (or Command+TAB), or alternatively, menu Action, command Decompile with Options
Decompilation #1: unsafe optimizers disabled

That decryptor’s control flow is obfuscated (flattened, controlled by the state variable v5). It is called once, depending on the boolean value at 0x2F227. Here, the decrypted contents is used by system_property_get.

Below, the contents in virtual memory, pre-decryption:

Encrypted contents.

Let’s perform another decompilation of the same routine, with the unsafe optimizers enabled this time. GENDEC now will:

  • detect something that potentially could be decryption code
  • start emulating the underlying IR (not visible here, but you can easily read/write the Intermediate Representation via API) portion of code is emulated
  • collect and apply results

See the decrypted contents below. (An data item existed beforehand at 0x2F137, and the decompiler chose not to erase it.) The decompiled code on the right panel no longer shows the decryption loop: an optimizer has discarded it since it can no longer be executed.

Decompilation #2: unsafe optimizers enabled

We may convert the data item (or bytes) to a string by pressing the A key (menu Native, command Create String). The decompiled code will pick it up and refresh the AST as well.

The final result looks like:

The VM and decompiled view show the decrypted code, “ro.build.version.sdk”

A few additional comments:

  • This optimizer is considered unsafe3 because it is allowed to modify the VM of the underlying native code unit, as seen above.
  • The optimizer is generic (architecture-agnostic). It performs its work on the underlying IR mid-stage in the decompilation pipeline, when various optimizations are applied.
  • It makes use of public API methods only, mostly the IREmulator class. Advanced users can write similar optimizers if they choose to. (We will also publish the code of this optimizer on GitHub shortly, as it will serve as a good real-life example of how to use the IR emulator to write powerful optimizers. It’s slightly more than 100 lines of Java.)

We hope you enjoy using JEB 4 Beta. There is a license type for everyone, so feel free to try things out. Do not hesitate to reach out to us on Twitter, Slack, or privately over email! Thanks, and until next time 🙂

  1. Users familiar with JEB’s Dex decompilers will remember that a similar feature was introduced to JEB 3 in 2020, for Android Dalvik code.
  2. sha256 43816c47315aab27e50e6f895774a7b86d591807179e1d3262446ab7d68a56ef also available as lib/arm64-v8a/libd.so in 309d848275aa128ebb7e27e570e5a2876977122625638630a6c61f7434b771c3
  3. “unsafe” in the context of decompilation; unsafe here is not to be understood as, “could any code be executed on the machine”, etc.

Traveling Around Mars With C Emulation

Disclaimer: a long time ago in our galaxy, we published part 1 of this blog post; then we decided to wait for the next major release of JEB decompiler before publishing the rest. A year and a half later, JEB 4.0 is finally out! So it is time for us to publish our complete adventure with MarsAnalytica crackme. This time as one blog covering the full story.

In this blog post, we will describe our journey toward analyzing a heavily obfuscated crackme dubbed “MarsAnalytica”, by working with JEB’s decompiled C code 1.

To reproduce the analysis presented here, make sure to update JEB to version 4.0+.

Part 1: Reconnaissance

MarsAnalytica crackme was created by 0xTowel for NorthSec CTF 2018. The challenge was made public after the CTF with an intriguing presentation by its author:

My reverse engineering challenge ‘MarsAnalytica’ went unsolved at #nsec18 #CTF. Think you can be the first to solve it? It features heavy #obfuscation and a unique virtualization design.

0xTowel

Given that exciting presentation, we decided to use this challenge mainly as a playground to explore and push JEB’s limits (and if we happen to solve it on the road, that would be great!).

The MarsAnalytica sample analyzed in this blog post is the one available on 0xTowel’s GitHub 2. Another version seems to be available on RingZer0 website, called “MarsReloaded”.

So, let’s examine the beast! The program is a large x86-64 ELF (around 10.8 MB) which, once executed, greets the user like this:

Inserting a dummy input gives:

It appears we have to find a correct Citizen ID! Now let’s open the executable in JEB. First, the entry point routine:

Entry Point

Ok, the classic libc entry point, now let’s look at strings and imports:

A few interesting imports: getchar() to read user input, and putchar() and puts() to write. Also, some memory manipulation routines, malloc() and memcpy(). No particular strings stand out though, not even the greeting message we previously saw. This suggests we might be missing something.

Actually, looking at the native navigation bar (right-side of the screen by default), it seems JEB analyzed very few areas of the executable:

Navigation Bar
(green is cursor’s location, grey represents area without any code or data)

To understand what happened let’s first look at JEB’s notifications window (File > Notifications):

Notifications Window

An interesting notification concerns the “Initial native analysis styles”, which indicates that code gaps were processed in PROLOGUES_ONLY mode (also known as a “conservative” analysis). As its name implies, code gaps are then disassembled only if they match a known routine prologue pattern (for the identified compiler and architecture).

This likely explains why most of the executable was not analyzed: the control-flow could not be safely followed and unreferenced code does not start with common prologue patterns.

Why did JEB used conservative analysis by default? JEB usually employs aggressive analysis on standard Linux executables, and disassembles (almost) anything within code areas (also known as “linear sweep disassembly”). In this case, JEB went conservative because the ELF file looks non-standard (eg, its sections were stripped).

Explore The Code (At Assembly Level)

Let’s take a look at the actual main() (first argument of __libc_start_main()):

main() code
(part 4)

Ok… that’s where the fun begins!

So, first a few memcpy() to copy large memory areas onto the stack, followed by series of “obfuscated” computations on these data. The main() routine eventually returns on an address computed in rax register. In the end, JEB disassembler was not able to get this value, hence it stopped analyzing there.

Let’s open the binary in JEB debugger, and retrieve the final rax value at runtime: 0x402335. We ask JEB to create a routine at this address (“Create Procedure”, P), and end up on very similar code. After manually following the control-flow, we end up on very large routines — around 8k bytes –, with complex control-flow, built on similar obfuscated patterns.

And yet at this point we have only seen a fraction of this 10MB executable… We might naively estimate that there is more than 1000 routines like these, if the whole binary is built this way (10MB/8KB = 1250)!

Most obfuscated routines re-use the same stack frame (initialized in main() with the series of memcpy()). In others words, it looks like a very large function has been divided into chunks, connected through each other by obfuscated control flow computations.

At this point, it seems pretty clear that a first objective would be to properly retrieve all native routines. Arguably the most robust and elegant way to do that would be to follow the control flow, starting from the entry point routine . But how to follow through all these obfuscated computations?

Explore The Code (At C Level)

Let’s now take a look at the pseudo-C code produced by JEB for those first routines. For example, here is main():

Decompiled main()

Overall, around 40 lines of C code, most of them being simple assignments, and a few others being complex operations. In comparison to the 200 non-trivial assembly instructions previously shown, that’s pretty encouraging.

What Do We Know

Let’s sum up what we noticed so far: MarsAnalytica’s executable is divided into (pretty large) handler routines, each of them passing control to the next one by computing its address. For that purpose, each handler reads values from a large stack, make a series of non-trivial computations on them, then write back new values into the stack.

As originally mentioned by 0xTowel, the crackme author, it looks like a virtual-machine style obfuscation, where bytecodes are read from memory, and are interpreted to guide the execution. It should be noted that virtual machine handlers are never re-executed: execution seems to go from lower to higher addresses, with new handlers being discovered and executed.

Also, let’s notice that while the executable is strongly obfuscated, there are some “good news”:

  • There does not seem to be any self-modifying code, meaning that all the code is statically visible, we “just” have to compute the control-flow to find it.
  • JEB decompiled C code looks (pretty) simple, most C statements are simple assignments, except for some lengthy expression always based on the same operations; the decompilation pipeline simplified away parts of the complexity of the various assembly code patterns.
  • There are very few subroutines called (we will come back on those later), and also a few system APIs calls, so most of the logic is contained within the chain of obfuscated handlers.

What Can We Do

Given all we know, we could try to trace MarsAnalytica execution by implementing a C emulator working on JEB decompiled code. The emulator would simulate the execution of each handler routine, update a memory state, and retrieve the address of the next handler.

The emulator would then produce an execution trace, and provide us access to the exact memory state at each step. Hence, we should find at some point where the user’s input is processed (typically, a call to getchar()), and then hopefully be able to follow how this input gets processed.

The main advantage of this approach is that we are going to work on (small) C routines, rather than large and complex assembly routines.

There are a few additional reasons we decided to go down that road:

The C emulator would be architecture-independent — several native architectures are decompiled to C by JEB –, allowing us to re-use it in situations where we cannot easily execute the target (e.g. MIPS/ARM).

– It will be an interesting use-case for JEB public API to manipulate C code. Users could then extend the emulator to suit their needs.

This approach can only work if the decompilation is correct, i.e. if the C code remains faithful to the original native code. In other words, it allows to “test” JEB decompilation pipeline’s correctness, which is — as a JEB’s developer — always interesting!

Nevertheless, a major drawback of emulating C code on this particular executable, is that we need the C code in the first place! Decompiling 10MB of obfuscated code is going to take a while; therefore this “plan” is certainly not the best one for time-limited Capture-The-Flag competitions.

Part 2: Building a (Simple) C Emulator

The emulator comes as a JEB back-end plugin, whose code can be found on our GitHub page. It starts in CEmulatorPlugin.java, whose logic can be roughly summarized as the following pseudo-code:

emulatorState = initEmulatorState();
while(true) {
  handlerRoutine = analyze(handlerAddress) // disassemble and decompile      
  emulatorState = emulator.emulate(handlerRoutine, emulatorState);

  handlerAddress = emulatorState.getNextHandlerAddress();
  if(handlerAddress.isUnknown()){
    break;
  }
}

In this part we will focus on emulate() method. This method’s purpose is to simulate the execution of a given C routine from a given machine state, and to provide in return the final machine state at the end of the routine.

Decompiled C Code

First thing first, let’s explore what JEB decompiled code looks like, as it will be emulate() input. JEB decompiled C code is stored in a tree-structured representation, akin to an Abstract Syntax Tree (AST).

For example, let’s take the following C function:

int myfunction()
{
    int a = 1;
    while(a < 3) {
        a = a + 1;
    }
    return a;
}

The JEB representation of myfunction body would then be:

AST Representation
(rectangles are JEB interfaces, circles are values)

As of JEB 4.0, the hierarchy of interfaces representing AST elements (i.e. nodes in the graph) is the following:

AST ICElement Hierarchy

Two parts of this hierarchy are of particular interest to us, in the context of building an emulator:

Now, a method’s AST can be retrieved with JEB API by using INativeDecompilerUnit.decompile() (see CEmulatorPlugin.disassembleAndDecompile() for how to disassemble and decompile a not-yet-existing routine).

Where Is The Control Flow?

While an AST provides a precise representation of C elements, it does not provide explicitly the control flow. That is, the order of execution of statements is not normally provided by an AST, which rather shows how some elements contain others from a syntactic point-of-view.

In order to simulate a C function execution, we are going to need the control flow. So here is our first step: compute the control flow of a C method and make it usable by our emulator.

To do so, we implemented a very simple Control-Flow Graph (CFG), which is computed from an AST. The code can be found in CFG.java, please refer to the documentation for the known limitations.

Here is for example the CFG for the routine previously presented myfunction():

myfunction() CFG

Why does JEB does not provide a CFG for decompiled C code? Mainly because at this point JEB decompiler does not need it. Most important optimizations are done on JEB Intermediate Representation — for which there is indeed a CFG. On the other hand, C optimizations are mainly about “beautifying” the code (i.e. pure syntactic transformations), which can be done on the AST only 3.

Emulator Implementation

The main logic of the emulator can be found in emulate(ICMethod method, EmulatorState inputState), which emulates a whole C method from a given input state:

CFG cfg = CFG.buildCFG(method);
ICStatement currentStatement = cfg.getEntryPoint();

while(currentStatement != null) {
   currentStatement = emulateStatement(cfg, currentStatement);
}

Before digging into the emulation logic, let’s see how emulator state is represented and initialized.

Emulator State

The emulator state is a representation of the machine’s state during emulation; it mainly comprehends the state of the memory and of the CPU registers.

The memory state is a IVirtualMemory object — JEB interface to represent virtual memory state. This memory state is created with MarsAnalytica executable initial memory space (set by JEB loader), and we allocate a large area at an arbitrary address to use as the stack during emulation:

// initialize from executable memory
memory = nativeUnit.getMemory();

// allocate large stack from BASE_STACK_POINTER_DEFAULT_VALUE (grows downward)
VirtualMemoryUtil.allocateFillGaps(memory, BASE_STACK_POINTER_DEFAULT_VALUE - 0x10_0000, 0x11_0000, IVirtualMemory.ACCESS_RW);

The CPU registers state is simply a Map from register IDs — JEB specific values to identify native registers — to values:

Map<Integer, Long> registers = new HashMap<>();

Emulator Logic

The emulator processes each ICStatement in two steps (see emulateStatement()):

  1. Update the state according to the statement semantic, i.e. propagate all side-effects of the statement to the emulator state.
  2. Determine which statement should be executed next; this might involve evaluating some predicates.

For example, let’s examine the logic to emulate a simple assignment like a = b + 0x174:

void evaluateAssignment(ICAssignment assign) {
  // evaluate right-hand side
  Long rightValue = evaluateExpression(assign.getRight());

  // assign to left-hand side
  state.setValue(assign.getLeft(), rightValue);
}

The method evaluateExpression() is in charge of getting a concrete value for a C expression (i.e. anything under ICExpression), which involves recursively processing all the subexpressions of this expression.

In our example, the right-hand side expression to evaluate is an ICOperation (b + 0x17). Here is the extract of the code in charge of evaluating such operations:

Long evaluateOperation(ICOperation operation) {
        ICExpression opnd1 = operation.getFirstOperand();
        ICExpression opnd2 = operation.getSecondOperand();
        ICOperator operator = operation.getOperator();

        switch(operator.getType()) {
        case ADD:
            return evaluateExpression(opnd1) + evaluateExpression(opnd2);

[...REDACTED...]

Therefore, we simply compute a concrete result using the corresponding Java operators for each ICOperator, and recursively evaluate the operands.

Now, evaluating variable b means either reading memory or a register, depending on where b is mapped:

Long getVarValue(ICIdentifier id) {
  // read memory for local/global variables...
  if(id.getIdentifierClass() == CIdentifierClass.LOCAL || id.getIdentifierClass() == CIdentifierClass.GLOBAL) {
      return readMemory(getVarAddress(id), getTypeSize(id.getType()));
  }
  // ...otherwise read CPU register
  else {
      return registers.get(id.getId());
  }
}

If b is a local variable, i.e. mapped in stack memory, the method ICIdentifier.getAddress() provides us its offset from the stack base address. Also note that an ICIdentifier has an associated ICType, which provides us the variable’s size (through the type manager, see emulator’s getTypeSize()).

Finally, evaluating constant 0x17 in the operation b + 0x17 simply means returning its raw value:

if(expr instanceof ICConstantInteger) {
    return ((ICConstantInteger<?>)expr).getValueAsLong();
}

For statements with more complex control flow than an assignment, the emulator has to select the correct next statement from the CFG. For example, here is the emulation of a while loop wStm (ICWhileStm):

// if predicate is true, next statement is while loop body...
if(evaluateExpression(wStm.getPredicate()) != 0) {
   return cfg.getNextTrueStatement(wStm);
}
// ...otherwise next statement is the one following while(){..}
else {
   return cfg.getNextStatement(wStm);
}

Refer to the complete implementation for more glory details.

Emulating System APIs

In MarsAnalytica there are only a few system APIs that get called during the execution. Among those APIs, only memcpy() is actually needed for our emulation, as it serves to initialize the stack (remember main()). Here is the API emulation logic:

Long simulateWellKnownMethods(ICMethod calledMethod,
            List<ICExpression> parameters) {

        if(calledMethod.getName().equals("→time")) {
            return 42L; // value does not matter
        }
        else if(calledMethod.getName().equals("→srand")) {
            return 37L; // value does not matter
        }
        else if(calledMethod.getName().equals("→memcpy")) {
            ICExpression dst = parameters.get(0);
            ICExpression src = parameters.get(1);
            ICExpression n = parameters.get(2);
            // evaluate parameters concrete values
            [...REDACTED...]
            state.copyMemory(src_, dst_, n_);
            return dst_;
          }
       }
}

Demo Time

The final implementation of our tracer can be found in our GitHub page. Once executed, the plugin logs in JEB’s console an execution trace of the emulated methods, each of them providing the address of the next one:

> emulating method sub_400DA9...
  >> done; next method entry point: 0x00402335
> emulating method sub_402335...
  >> done; next method entry point: 0x00402335
> emulating method sub_402335...
  >> done; next method entry point: 0x00401b8f
> emulating method sub_401B8F...
  >> done; next method entry point: 0x004018cd
> emulating method sub_4018CD...
  >> done; next method entry point: 0x00401f62
> emulating method sub_401F62...
  >> done; next method entry point: 0x00402335
> emulating method sub_402335...
  >> done; next method entry point: 0x00403477
> emulating method sub_403477...
  >> done; next method entry point: 0x00401502
> emulating method sub_401502...
  >> done; next method entry point: 0x004018cd

[...REDACTED...]

Good news everyone: the handlers addresses are correct (we double-checked them with a debugger). In other words, JEB decompilation is correct and our emulator remains faithful to the executable logic. Phew…!

Part 3: Solving The Challenge

Plot Twist: It Does Not Work

The first goal of the emulator was to find where user’s input is manipulated. We are looking in particular for a call to getchar(). So we let the emulator run for a long time, and…

…it never reached a call to getchar().

The emulator was correctly passing through the obfuscated handlers (we regularly double-checked their addresses with a debugger), but after a few days the executed code was still printing MarsAnalytica magnificent ASCII art prompt (reproduced below).

MarsAnalytica Prompt

After investigating, it appears that characters are printed one by one with putchar(), and each of these calls is in the middle of one heavily obfuscated handler, which will be executed once only. More precisely, after executing more than one third of the whole 10MB, the program is still not done with printing the prompt!

As mentioned previously, the “problem” with emulating decompiled C code is that we need the decompiled code in the first place, and decompiling lots of obfuscated routines takes time…

Let’s Cheat

Ok, we cannot reach in a decent time the point where the user’s input is processed by the program. But the execution until this point should be deterministic. What if… we start the emulation at the point where getchar() is called, rather than from the entry-point?

In other words, we are going to assume that we “found” the place where user’s input starts to be processed, and use the emulator to analyze how this input is processed.

To do so, we used GDB debugger to set a breakpoint on getchar() and dumped both stack and heap memories at this point 5. Then, we extended the emulator to be able to initialize its memory state from stack/heap memory dumps, and change emulation start address to be the first call to getchar().

What Now?

At this point getchar() is called to get the first input character, so we let the emulator simulate this API by returning a pseudo-randomly chosen character, such that we can follow the rest of the execution. After 19 calls to getchar() we finally enter the place where user’s input is processed. Hooray…

Then, we let the emulator run for a whole day, which provided the execution trace we will be working on for the rest of this blog. After digging into the trace we noticed that input characters were passed as arguments to a few special routines.

Introducing The Stack Machine

When we first skimmed through MarsAnalytica code, we noticed a few routines that seemed specials for two reasons:

  • While obfuscated routines are executed only once and in a linear fashion (i.e. from low to high memory addresses), these “special” routines are at the very beginning of the executable and are called very often during the execution.
  • These routines’ code is not obfuscated and seems to be related with memory management at first sight.

For example, here is JEB decompiled code for the first of them (comments are ours):

long sub_400AAE(unsigned long* param0, int param1) {
    long result;
    unsigned long* ptr0 = param0;
    int v0 = param1;

    if(!ptr0) {
        result = 0xffffffffL;
    }
    else {
        // allocate new slot
        void* ptr1 = →malloc(16L);
        if(!ptr1) {
            /*NO_RETURN*/ →exit(0);
        }

        // set value in new slot
        *(int*)((long)ptr1 + 8L) = v0;

        // insert new slot in first position
        *(long*)ptr1 = *ptr0;
        *ptr0 = ptr1;
        result = 0L;
    }

    return result;
}

What we have here is basically a “push” operation for a stack implemented as a chained list (param0 is a pointer to the top of the stack, param1 the value to be pushed).

Each slot of the stack is 16 bytes, with the first 8 bytes being a pointer to the next slot and the next 4 bytes containing the value (remaining 4 bytes are not used).

It now seemed clear that these special routines are the crux of the challenge. So we reimplemented most of them in the emulator, mainly as a way to fully understand them. For example, here is our “push” implementation:

/** PUSH(STACK_PTR, VALUE) */
if(calledMethod.getName().equals("sub_400AAE")) {
    Long pStackPtr = evaluateExpression(parameters.get(0));
    Long pValue = evaluateExpression(parameters.get(1));

    long newChunkAddr = allocateNewChunk();

    // write value
    state.writeMemory(newChunkAddr + 8, pValue, 4);

    // link new chunk to existing stack
    Long stackAdr = state.readMemory(pStackPtr, 8);
    state.writeMemory(newChunkAddr, stackAdr, 8);

    // make new chunk the new stack head
    state.writeMemory(pStackPtr, newChunkAddr, 8);

}

Overall, these operations are implementing a custom data-structure that can be operated in a last-in, first-out fashion, but also with direct accesses through indexes. Let’s call this data structure the “stack machine”.

Here are the most used operators:

AddressOperator
(names are ours)
Argument(s)
0x400AAEPUSHVALUE
0x4009D7POPVALUE
0x400D08GETINDEX
0x400D55SETINDEX,VALUE
Stack Machine’s Main Operators

Tracing The Stack Machine

At this point, we modified the emulator to log only stack operations with their arguments, starting from the first call to getchar(). The full trace can be found here, and here is an extract:

S: SET index:7 value:97
S: SET index:8 value:98
S: SET index:13 value:99
S: SET index:15 value:100
S: SET index:16 value:101

[...REDACTED...]

S: PUSH 2700
S: POP (2700)
S: SET index:32 value:2700
S: GET index:32
S: PUSH 2700
S: PUSH 2
S: POP (2)
S: POP (2700)
S: PUSH 2702

[...REDACTED...]

The trace starts with a long series of SET operations, which are storing the result of getchar() at specific indexes in the stack machine (97, 98, 99,… are characters provided by the emulator).

And then, a long series of operations happen, combining the input characters with some constant values. Some interesting patterns appeared at this point, for example:

S: POP (2)   
S: POP (2700)   
S: PUSH 2702   

Here an addition was made between the two popped values, and the result was then pushed. Digging into the trace, it appears there are also handlers popping two values and pushing back a subtraction, multiplication, exclusive or, etc.

Another interesting pattern appears at several places:

S: POP (16335)   
S: POP (1234764)   
S: PUSH 1   

Looking at the corresponding C code, it is actually a comparison between the two popped values — “greater than” in this case –, and the boolean result (0 or 1) is then pushed. Once again, different comparison operators (equal, not equal, …) are used in different handlers.

Finally, something suspicious also stood out in the trace:

S: PUSH 137
S: PUSH 99
S: POP (137)
S: POP (99)

The popped values do not match the order in which they were pushed!

Digging into the code we end up on a special routine (0x402AB2), which swaps the two top-most values… So to make things clearer, the emulator logs in the execution trace a SWAP operator whenever this routine gets executed.

Where Is My Precious Operator?

Our objective here is to understand how input characters are manipulated, and what tests are done on them. In other words, we want to know for each POP/POP/PUSH pattern if it is an operation (and which operation — addition, subtraction …–), or a test (and which test — equal, greater than …–).

Again, note that routines implementing POP/POP/PUSH patterns are executed only once. So we cannot individually analyze them and rely on their addresses.

This is where working on decompiled C code becomes particularly handy. For each POP/POP/PUSH series:

  • We search in the method’s decompiled code if a C operator was used on the PUSH operand. To do so, it is as simple as looking at the operand itself, thanks to JEB decompiler’s optimizations! For example, here is a subtraction:
...
long v1 = pop(v0 - 0x65f48L); 
long v2 = pop(v0 - 0x65f48L); 
push(v0 - 0x65f48L, v1 - v2);
...

When a C operator is found in push() second operand, the emulator adds the info (with the number of operands) in the trace:

S: POP (137)
S: POP (99)
S: PUSH 38
| operation: (-,#op=2)
  • Also, we check if there is a “if” statement following a POP in the C code. For example, here is a “greater-than” check between popped values:
...
long v2 = pop(v0 - 0x65f48L); 
long v3 = pop(v0 - 0x65f48L); 
if(v2 > v3) {
...

If so, the emulator extracts the C operator used in the if statement and logs it in the trace (as a pseudo stack operator named TEST):

S: POP (16335)   
S: POP (1234764)   
S: TEST (>,#op=2) 
S: PUSH 0   

It should be noted that operands are always ordered in the same way: first poped value is on left side of operators. So operators and operands are the only thing we need to reconstruct the whole operation.

Time To Go Symbolic

At this point, our execution trace shows how the user’s input is stored onto the stack, and which operations and tests are then done. Our emulator is providing a “bad” input, so they are certainly failed checks in our execution trace. Our goal is now to find these checks, and then the correct input characters.

At this point, it is time to introduce “symbolic” inputs, rather than using concrete values as we have in our trace. To do so, we made a quick and dirty Python script to replay stack machine trace using symbolic variables rather than concrete values.

First, we initialize a Python “stack” with symbols (the stack is a list(), and the symbols are strings representing each character “c0“, “c1“, “c2“…). We put those symbols at the same indexes used by the initial SET operations:

# fill stack with 'symbolic' variables (ie, characters)
# at the initial offset retrieved from the trace
stack = [None] * 50 # arbitrary size
charCounter = 0
stack[7] = 'c' + str(charCounter) # S: SET index:7 value:c0
charCounter+=1
stack[8] = 'c' + str(charCounter) # S: SET index:8 value:c1

[... REDACTED ...]

We also need a temporary storage for expressions that get popped from the stack.

Then, we read the trace file and for each stack operation we execute the equivalent operation on our Python stack:

if operator == "SWAP":
  last = stack.pop()
  secondToLast = stack.pop()
  stack.append(last)
  stack.append(secondToLast)

elif operator == "GET":
  index = readIndexFromLine(curLine)
  temporaryStorage.append(stack[int(index)])

elif operator == "SET":
  index = readIndexFromLine(curLine)
  stack[int(index)] = temporaryStorage.pop()

elif operator == "POP":
  value = stack.pop()
  temporaryStorage.append(value)

[... REDACTED ...]

Now here is the important part: whenever there is an operation, we build a new symbol by “joining” the symbol operands and the operator. Here is an example of an addition between symbols “c5” and “c9“, corresponding respectively to the concrete input characters initially stored at index 26 and 4:

Concrete TraceSymbolic Trace
...
GET index:26

PUSH 102

GET index:4

PUSH 106

POP (106)

POP (102)

PUSH 208
| operation: (+,#op=2)
...
...
GET index:26

PUSH "c5"

GET index:4

PUSH "c9"

POP ("c9")

POP ("c5")

PUSH "c9+c5"

...
Concrete execution trace, and its corresponding symbolic trace; on the symbolic side, rather than pushing the actual result of 106 + 102, we build an addition between the two symbols corresponding to the two concrete values

Note that our symbolic executor starts with a clean stack, containing only input symbols. All constants used during the computation are indeed coming from the bytecode (the large memory area copied on the (native) stack at the beginning of the execution), and not from the stack machine.

We can then observe series of operations on input symbols getting build by successive POP/POP/PUSH patterns, and being finally checked against specific values. Here is an extract of our stack at the end:

((((c12-c10)^c13)*(c6*c14))!=16335)
(((c18^c1)^(c15-c7))!=83)
((((c9+c5)^c0)*(c17-c16))!=4294961394)
((c3-c11)!=11)
(((c2+c4)^c8)!=3)
((c8+(c15-c4))!=176)
((((c9^c10)-(c11+c18))^c6)!=4294967097)
(((c1*(c0^c17))+(c2*c16))!=9985)
(((c14*c13)-c7)!=2083)
(((c12+c3)-c5)!=110)
(((c8*c10)+(c9+c13))!=5630)
(((c5-c16)-(c2+c0))!=4294967114)
((c17*(c14^c7))!=7200)
(((c1*c3)+(c6*c11))!=17872)
(((c12-c15)-(c18*c4))!=4294961888)
(((c11*c2)+(c3*c15))!=18888)
((c16*(c5+c13))!=15049)
((c17*(c0+c10))!=12150)
((c18*(c14^c6))!=10080)
(((c7+c12)-c4)!=132)
((c8+(c1*c9))!=2453)

It seems pretty clear that those checks are the ones we are looking for, except that we need to revert inequality tests into equality tests.

Now, how to find the values of symbols “c0“, “c1“,.. passing these tests?

The Final

To find the correct input characters, we used Z3 SMT solver Python bindings, and let the solver do its magic:

from z3 import *

# initialize our characters as 8-bit bitvectors
c0 = BitVec('c0', 8)
c1 = BitVec('c1', 8)
c2 = BitVec('c2', 8)
c3 = BitVec('c3', 8)
c4 = BitVec('c4', 8)
c5 = BitVec('c5', 8)
c6 = BitVec('c6', 8)
c7 = BitVec('c7', 8)
c8 = BitVec('c8', 8)
c9 = BitVec('c9', 8)
c10 = BitVec('c10', 8)
c11 = BitVec('c11', 8)
c12 = BitVec('c12', 8)
c13 = BitVec('c13', 8)
c14 = BitVec('c14', 8)
c15 = BitVec('c15', 8)
c16 = BitVec('c16', 8)
c17 = BitVec('c17', 8)
c18 = BitVec('c18', 8)

s = Solver()

# allowed character range
s.add(c0 > 32, c0 < 127)
s.add(c0 > 32, c0 < 127)
s.add(c1 > 32, c1 < 127)
s.add(c2 > 32, c2 < 127)
s.add(c3 > 32, c3 < 127)
[... REDACTED ...]

# checks
s.add((((c12-c10)^c13)*(c6*c14))==16335)
s.add(((c18^c1)^(c15-c7))==83)
s.add((((c9+c5)^c0)*(c17-c16))==4294961394)
s.add((c3-c11)==11)
s.add(((c2+c4)^c8)==3)
s.add((c8+(c15-c4))==176)
[... REDACTED ...]

Here is another advantage to work with C code: the expressions built from our emulator’s trace are using high-level operators, which are directly understood by Z3.

Finally, we ask Z3 for a possible solution to the constraints, and we build the final string from c0, c1,… values:

m = s.model()
result = ''
result += chr(m[c0].as_long())
result += chr(m[c1].as_long())
result += chr(m[c2].as_long())
result += chr(m[c3].as_long())
...

And…

Hurray!

Conclusion

We hope you enjoy this blog post, where we used JEB C decompiled code to analyze a heavily obfuscated executable.

Please refer to our GitHub page for emulator code. While it has been tailored for MarsAnalytica crackme, it can be extended to emulate any executable’s decompiled C code (MarsAnalytica’s specific emulation logic is constrained in subclass MarsAnalyticaCEmulator).

You can run the plugin directly from JEB UI (refer to README):

By default, it will show emulation traces as text subunits in JEB project (stack machine trace in MarsAnalytica mode, or just C statements trace):

Plugin output: left panel is MarsAnalytica stack machine trace (when MarsAnalytica specific emulation logic is enabled), while right panel shows C statements emulation trace

Alternatively, the plugin comes with a headless client, more suitable to gather long running emulation traces.

Finally, kudo to 0xTowel for the awesome challenge! You can also check the excellent Scud’s solution.

Feel free to message us on Slack if you have any questions. In particular, we would be super interested if you attempt to solve complex challenges like this one with JEB!


  1. While JEB’s default decompiled code follows (most of) C syntactic rules and their semantics, some custom operators might be inserted to represent low-level operations and ease the reading; hence strictly speaking JEB’s decompiled code should be called pseudo-C. The decompiled output can also be variants of C, e.g. the Ethereum decompiler produce pseudo-Solidity code.
  2. SHA1 of the UPX-packed executable: fea9d1b1eb9d3f93cea6749f4a07ffb635b5a0bc
  3. Implementing a complete CFG on decompiled C code will likely be done in future versions of JEB, in order to provide more complex C optimizations.
  4. The actual implementation is more complex than that, e.g. it has to deal with pointers dereferencement, refer to emulateStatement() for details.
  5. Dumping memory was done with peda for GDB, and commands dumpmem stack.mem stack and dumpmem heap.mem heap

Reversing an Android app Protector, Part 3 – Code Virtualization

In this series: Part 1, Part 2, Part 3

The third part of this series is about bytecode virtualization. The analyses that follow were done statically.

Bytecode virtualization is the most interesting and technically challenging feature of this protector.

TL;DR:
– JEB Pro can un-virtualize protected methods.
– A Global Analysis (Android menu) will point you to p-code VM routines.
– Make sure to disable Parse Exceptions when decompiling such methods.
– For even clearer results, rename opaque predicates of the method to guard0/guard1 (refer part 1 of this blog for details)

What Is Code Virtualization

Relatively novel, code virtualization is possibly one of the most effective protection technique there is 1. With it come relatively heavy disadvantages, such as hampered speed of execution 2 and the difficulty to troubleshoot production code. The advantages are heightened reverse-engineering hurdles over other more traditional software protection techniques.

Virtualization in the context of code protection means:

  • Generating a virtual machine M1
  • Translating an original code object C0 meant to be executed on a machine M0 3, into a semantically-equivalent code object C1, to be run on M1.

While the general features of M1 are likely to be fixed (e.g., all generations of M1 are stack machines with such and such characteristics), the Instruction Set Architecture (ISA) of M1 may not necessarily be. For example, opcodes, microcodes and their implementation may vary from generation to generation. As for C1, the characteristics of a generation are only constrained by the capabilities of the converter. Needless to say, standard obfuscation techniques can be applied on C1. The virtualization process can possibly be recursive (C1 could be a VM implementing the specifications of a machine M2, executing a code object C2, emulating the original behavior of C0, etc.).

All in all, in practice, this makes M1 and C1 unique and hard to reverse-engineer.

Before and after virtualization of a code object C0 into C1

Example of a Protected Method

Note: all identifier names had been obfuscated. They were renamed for clarity and understanding.

Below, the class VClass was found to be “virtualized”. A virtualized class means that all non-constructor (all but <init>(*)V and <clinit>()V) methods were virtualized.

Interestingly, the constructors were not virtualized

The method d(byte[])byte[] is virtualized:

  • It was converted into an interpreter loop over two large switch constructs that branch on pseudo-code entries stored in the local array pcode.
  • A PCodeVM class was added. It is a modified stack-based virtual machine (more below) that performs basic load/store operations, custom loads/stores, as well as some arithmetic, binary and logical operations.
Virtualized method. Note the pcode array. The opcode handlers are located in two switches. This picture shows the second switch, used to handle specific operations and API calls.

A snippet of the p-code VM class. Full code here, also contains the virtualized class.

The generic interpreter is called via vm.exec(opcode). Execution falls back to a second switch entry, in the virtualized method, if the operation was not handled.

Please refer to the gist linked above for a full list of “generic” VM operations. Three examples, including one showing that the operations are not as generic as the term implies:

(specific to this VM) opcode 6, used to peek the most recently pushed object
(specific to this VM) opcode 8, a push-int operation
(specific to this VM) opcode 23 is relatively specialized, it implements an add-xor stack operation (pop, pop, push). It is quite interesting to see that the protection system does not simply generate one-to-one, dalvik-to-VM opcodes. Instead, the target routine is thoroughly analyzed, most likely lifted, high-level (compounded) arithmetic operations isolated, and pseudo-generic (in PCodeVM) or specialized (in the virtualized method) opcodes generated.

As said, negative opcodes represent custom operations specific to a virtualized method, including control flow changes. An example:

opcode -25: a if(a >=b) goto LABEL operation (first, call into opcode 55 to do a GE operation on the top two integers; then, use the result to do conditional branching)

Characteristics of the P-code VM

From the analysis of that code as well as virtualized methods found in other binaries, the characteristics of the p-code VM generated by the app protector can be inferred:

  • The VM is a hybrid stack machine that uses 5 parallel stacks of the same height, stored in arrays of:
    • java.lang.Object (accommodating all objects, including arrays)
    • int (accommodating all small integers, including boolean and char)
    • long
    • float
    • double
  • For each one of the 5 stack types above, the VM uses two additional registers for storing and loading
  • Two stack pointers are used: one indicates the stack TOP, the other one seems to be used more liberally, and is akin to a peek register
  • The stack has a reserved area to store the virtualized method parameters (including this if the method is non-static)
  • The ISA encoding is trivial: each instruction is exactly one-word long, it is the opcode of the p-code instruction to be executed. There is no concept of register, index, or immediate value embedded into the instruction, as most stack machine ISA’s have.
  • Because the ISA is so simple, the implementation of the semantics of an instruction falls almost entirely on the p-code handler. For this reason, they were grouped into two categories:
    • Semi-generic VM operations (load/store, arithmetic, binary, tests) are handled by the VM class and have a positive id. (A VM object is used by every virtualized method in a virtualized class.)
    • Operations specific to a given virtualized method (e.g., method invocations) use negative ids and are handled within the virtualized method itself.

P-code obfuscation: junk insertion, spaghetti code

While the PCodeVM opcodes are all “useful”, many specific opcodes of a virtualized method (negative ids) achieve nothing but the execution of code semantically equivalent to NOP or GOTO.

opcodes -2, -1: essentially branching instructions. A substantial amount of those can be found, including some branching to blocks with no other input but that source (i.e., an unnecessary GOTO – =spaghetti code -, or a NOP operation if the next block is the follow.)

Rebuilding Virtualized Methods

Below, we explain the process used to rebuild a virtualized method. The CFG’s presented are IR-CFG’s (Intermediate Representations) used by the dexdec 4 pipeline. Note that unlike gendec‘s IR 5, dexdec‘s IR is not exposed publicly, but its textual representation is mostly self-explanatory.

Overall, a virtualized routine, once processed by dexdec like any other routine, looks like the following: A loop over p-code entries (stored in x8 below), processed by a() at 0xE first, or by the large routine switch.

Virtualized method, optimized, virtualized

The routine a() is PCodeVM.exec(), and its optimized IR boils down to a large single switch. 6

PCodeVM.exec()

The unvirtualizer needs to identify key items in order to get started, such as the p-code entries, identifiers used as indices into the p-code array, etc. Once they have been gathered, concolic execution of the virtualized routine becomes possible, and allows rebuilding a raw version of the original execution flow. Multiple caveats need to be taken care of, such as p-code inlining, branching, or flow termination. In its current state, the unvirtualizer disregards exceptional control flow.

Below, a raw version of the unflattened CFG. Note that all operations are stack-based; the code itself has not been modified at this point, it still consists of VM stack-based operations.

Virtualized method after unflattening, raw

dexdec’s standard IR optimization passes (dead-code removal, constant and variable propagation, folding, arithmetic simplification, flow simplifications, etc.) clean up the code substantially:

Virtualized method after unflattening and IR optimizations (opt1)

At this stage, all operations are stack-based. The high-level code generated from the above would be quite unwieldy and difficult to analyze, although substantially better than the original double-switch.

The next stage is to analyze stack-based operations to recover stack slots uses and convert them back to identifiers (which can be viewed as virtual registers; essentially, we realize the conversion of stack-based operations into register-based ones). Stack analysis can be done in a variety of ways, for example, using fixed-point analysis. Again, several caveats apply, and the need to properly identify stacks as well as their indices is crucial for this operations.

Virtualized method after unflattening, IR optimizations, VM stack analysis (opt2)

After another round of optimizations:

Virtualized method after unflattening, IR optimizations, VM stack analysis, IR optimizations (opt2_1)

Once the stack analysis is complete, we can replace stack slot accesses by identifier accesses.

Virtualized method after unflattening, IR optimizations, VM stack analysis, IR optimizations, virtual registers insertion (opt3)

After a round of optimizations:

Virtualized method after unflattening, IR optimizations, VM stack analysis, IR optimizations, virtual registers insertion, IR optimizations (opt3)

At this point, the “original” CFG is essentially reconstructed, and other advanced deobfuscation passes (e.g., emulated-based deobfuscators) can be applied.

The high-level code generation yields a clean, unvirtualized routine:

High-level code, unvirtualized, unmarked

After reversing, it appears to be a modified RC4 algorithm. Note the +3/+4 added to the key.

High-level code, unvirtualized, marked

Detecting Virtualized Methods

All versions of JEB detect virtualized methods and classes: run Global Analysis (GUI menu: Android) on your APK/DEX and look for those special events:

dexdec event:
“Found Virtualized routine handler (P-Code VM)”

JEB Pro version 3.22 7 ships with the unvirtualizer module.

Tips:

  • Make sure to enable the Obfuscators, and enable Unvirtualization (enabled by default in the options).
  • The try-blocks analysis must be disabled for the class to unvirtualize. (Use MOD1+TAB to redecompile, untick “Parse Exception Blocks”).
  • After a first decompilation pass, it may be easier to identify guard0/guard1, rename, and recompile, else OP obfuscation will remain and make the code unnecessarily difficult to read. (Refer to part 1 of this series to learn about what renaming those fields to those special names means and does when a protected app is detected.)

Conclusion

We hope you enjoyed this third installment on code (un)virtualization.

There may be a fourth and final chapter to this series on native code protection. Until next time!

  1. On a personal note, my first foray into VM-based protection dates back to 2009 with the analysis of Trojan.Clampi, a Windows malware protected with VMProtect
  2. Although one could argue that with current hardware (fast x64/ARM64 processors) and software (JIT’er and AOT compilers), that drawback may not be as relevant as it used to be.
  3. Machine here may be understood as physical machine or virtual machine
  4. dexdec is JEB’s dex decompiler engine
  5. gendec is JEB’s generic decompilation pipeline
  6. Note the similarities with CFG flattened by chenxification and similar techniques. One key difference here is that the next block may be determined using the p-code array, instead of a key variable, updated after each operation. I.e., is the FSM – controlling what the next state (= the next basic block) is – embedded in the flattened code itself, or implemented as a p-code array.
  7. JEB Android and JEB demo builds do not ship the unvirtualizer module. I initially wrote this module as a proof-of-concept not intended for release, but eventually decided to offer it to our professional users who have legitimate (non malicious) use cases, e.g. code audits and black-box assessments.

The (Long) Journey To A Multi-Architecture Disassembler

Last week we presented a talk at REcon on the internals of JEB’s native disassembler.

During this talk, we focused on some of the research problems we encountered while developing our custom disassembler engine — the foundation for JEB native decompiler.

The talk’s video can be found here, and interested readers can find an extended version of the slide deck here.

If you have questions, comments or suggestions, feel free to:

Android Updates in JEB 3.2

As the latest update makes its way to all users (changelog), it is a good time to quickly recap additions related to Android analysis that made it into JEB versions 3.1.4, 3.1.5, and 3.2.

Dalvik Decompiler Updates

The newest releases of JEB contain several improvements to the Dalvik decompiler. I will highlight only a couple that users may find interesting. 1

Enumerations

Compiled Java enumerations can be complicated beasts. JEB attempts to re-sugar them to the best of its ability. On failure, regular classes extending java.lang.Enum will be rendered.

Obfuscation sometimes destroy important synthetic fields and structures that allow recovery heuristics to work. However, support should function reasonably well, even on enumeration data that was intentionally shuffled to generate decompilation errors. Moreover, and to keep with the spirit of interactivity in JEB, enumerated fields can be renamed – and it is done consistently over the code base, including over reconstructed switches making use of such enums.

Decompiled enums in android.arch.lifecycle. Renaming and cross-referencing enumerated constants is supported.

Custom enumerated constants are also properly reconstructed, including:

  • Field annotations
  • Custom initializers (see below)
  • Additional methods and method overrides
In this complex enumeration, the red block shows a custom initializer. Other interesting bits are the use of overrides and custom methods, annotations, as well as default and non-default constructors.

Switches

Support was recently added for switch-on-enum and switch-on-string (partial support for the latter, to be continued in the next software update).

This successfully reconstructed switch-on-string is implemented as a double-switch idiom by dx (a sparse switch on hashCode/equals to generate custom indices i, followed by a packed-switch on i). Not all switches are implemented like this. Regular if-conditional trees may be strategically generated by optimizing compilers.

Inner classes, Anonymous classes

We improved rendering support for named- and anonymous-inner classes. Properly rendering anonymous classes in particular is made difficult by the fact that some of its arguments are captured from the outer classes. Properly rendering anonymous constructors, with exact argument types and position, is also challenging.

Lately, a user sent us a sample making use of an anonymous class initializer to hide string decryption code. See below:

  • The anonymous class extends Android’s OnActivityResultListener, instantiates the object, and tosses it immediately.
  • Decryption code takes place in the initializer. Note the captured arguments from the outer container method __m: i, _b. Access to other private class fields is made via synthetic accessor calls that were re-sugared into seemingly direct field access (BA._b).
Pseudo-moot anonymous class with an instance initializer attempting to conceal string decryption code.

Plugin options

Remember that some decompiler properties are publicly available in the options: (menu: Edit, Options, Advanced, Engines)

  • All Dalvik decompilation options: see the .parsers.dcmp_dex.* namespace
  • All Java rendering options of decompiled code: see the .parsers.dcmp_dex.text.* namespace

1)Rendering options are real-time options that can be changed after the fact to customize the output. Right-click on a decompiled class output, and select Rendering Options:

2) Decompilation options are used to guide and customize the decompilation. They can be changed in the Engines options, or more simply, when performing a decompilation itself, by invoking “Decompile with Options…” instead of “Decompile”.

Keyword for “Decompile with Options”:
CTRL+TAB (Windows, Linux) or COMMAND+TAB (macOS)

Bring up the “Decompile with Options” dialog by using CTRL+TAB/COMMAND+TAB when decompiling. Hover over properties to get extra documentation in the tooltip.

API additions

Essential updates to:

  • IJavaSwitch: additional methods to access switch-on-enum and switch-on-string data
  • IJavaForEach: additional type introduced to manipulate for-each statements: for(Type var: iterator_or_array) { … }

Other changes, What next

JEB 3.2 contains other improvements, such as:

  • Better auto-naming, including default usage of debug data, if present (can be disabled in the options)
  • Improved typing and type propagation
  • Additional IR and AST optimizations
  • Better exceptional flow processing
  • Rendering of try-catch, synchronized blocks, etc.
  • Decompilation of invoke-polymorphic (invoke-custom is not supported, see below the part on lambdas on method handles)

We have more planned for the coming releases, including:

  • Improved support for switch-on-string. As said earlier, some of those switches, when properly detected, are re-sugared into legal Java-8 switch-on-string. However, the nature of those high-level constructs (they are implemented as double-conditionals, sometimes double-switches) makes it quite hard in some cases to provide proper reconstruction. It is something that will be improved in the future.
  • Support for generics. We had decided to not implement Java 5-style type generic since the information, when provided, is stored as pure metadata and should not be trusted. However, in practice, it turns out to be helpful when auditing legitimate, non-obfuscated compiled apps. We will add optional support for that in a coming release.
  • Support for try-with-resources. try(resource)/catch/finally are difficult very-high-level idioms to reconstruct. Optimizing compilers generate a substantial amount of additional, highly optimized code to implicitly catch exceptions and auto-close resources, making it extra difficult to reconstruct in the general case. We will likely introduce partial support before the summer.
  • Lambdas. It is a planned addition. We will soon be re-sugaring Android implementations of Java 8+ lambdas into proper lambda functions. Same goes for method handles (::). That’s quite exciting and may pave the way for a hypothetical Kotlin decompiler, since that language implicitly and explicitly rely on lambdas extensively.

Debuggable APK Generation

For several reasons, it is easier to debug Android applications explicitly marked debuggable in their Manifest.

  • Debugging non-debuggable APK requires root access to the operating system. Which means rooting a production phone, using an emulator
    2 image built as userdebug, or building a custom userdebug image from AOSP.
  • Any of the above solutions have shortcomings: rooted production builds and userdebug builds expose features that non-rooted production builds do not have, and can be fingerprinted as such; Debugging native code of applications on non-rooted devices requires replacing system-level utilities; the API level and OS features also play a role, eg, SE-Android needs to be disabled on recent OS in order for debugging to work.

In many cases, rebuilding a release app into a debug-mode app (with <application android:debuggable=”true” …>) is a viable solution, and one that does not require using root, obviously. Many users are implementing this solution via apktool. However it is frequent for the tool to fail decoding complex APKs, let alone rebuild them with different settings.

We have introduced a feature in JEB that makes rebuilding non-debuggable APK to debuggable APK easy and fast:

$ jeb_wincon.bat -c --makeapkdebug -- file.apk

Upon success, file_debuggable.apk will be generated. Sign it (Android SDK’s apksigner), install it on your device, and start debugging. Remember that this solution has its shortcomings as well! Anti-debugging code may check at runtime that the app is not debuggable, as would be expected. More elaborate solutions implement certificate pinning-style checks, where the code verifies that it is signed using a specific certificate. Be careful when debugging rebuilt APK.

This malware app was made debuggable

Keyboard Shortcuts for Script

Bind your JEB Python scripts to keyboard shortcut by adding a line at the top of your script:

#?shortcut=xxx

where xxx is your keyboard shortcut, eg: Ctrl+Shift+T

Permitted keyboard modifiers are Ctrl, Shift, Alt, as well as the generic Mod1, mapping to macOS’s Command (Apple) key, or Control on Windows/Linux.

Sublime Text 3 Extension

Are you writing Python scripts to automate your JEB reversing tasks? If so, give a try to using the “JEB Script Development Helper” package available on Sublime Text’s Package Control.

JEB Python scripts with Sublime Text

To install it:

  • Install ST3
  • Install Package Control
  • Open the Package Control and Install a new extension
  • Search for “JEB” and install the extension

The extension allows you to:

  • Auto-completion on JEB types and attributes
  • Auto-import JEB classes: CTRL+ALT+I on a class names
  • Easily create script skeleton (CTRL+SHIFT+P, “JEB: Create a new script”)
  • Easily update to the latest API doc, usually published right after a new release (CTRL+SHIFT+P, “JEB: Update to latest API doc file”)

API changes

Recent API changes are not specific to Android components of JEB. You will find updated sample code on GitHub.

  1. If you are seeing unintended changes or bugs related to this update, let us know so that we can fix things quickly.
  2. Emulator here means an emulator running a userdebug Android build, as Google-provided images are

JEB Native Analysis Pipeline – Part 2: IR Optimizers

In part 1 of this series, we gave an overview of the Intermediate Representation used by JEB’s Native Analysis Pipeline, as well as a simple Python script demonstrating how to use the API to access and print out IR-CFG of decompiled routines.

In part 2, we continue our exploration of JEB IR. We will show how to write a custom IR optimizer plugin to clean-up a custom obfuscation used in a piece of code. The resulting decompiled C code will end up very readable as well.

Before you proceed, make sure to update JEB Pro to version 3.1.1+.

Obfuscated Crypto-stealer Code

The sample we are going to look at monitors Windows clipboards for cryptocurrency-looking wallet addresses, and replaces them with a desired target address. The sample is specifically targeting Ethereum wallet addresses. It is a neutered final stage payload – the recipient address has been scrambled to render the code ineffective.

PE characteristics of file 1.exe

Although the payload is unpacked, what is interesting is that one of its key routines is obfuscated: custom garbage code was inserted.

Junk (useless) assignments

The garbage code is easy to go through: a bit of manual analysis shows that junk instructions are assigning pseudo-random values to an array whose bytes are never used. Two types of assembly patterns are present:

1- mov dword ptr [edi + offset], junk_value ; edi previously init. to
; junkarray address
2- push junk_value
pop dword ptr [junkarray_address + offset]

If we decompile that code and look at the final IR (as shown below), we can see that those instructions ended up being converted and optimized to the following type of assignment:

Assign(Mem(mem_address), Imm(junk_value))

Currently, the decompiled code looks like the following, hard-to-digest blob:

Snippet of decompiled code (obfuscated)

Although quite painful to read, we can follow the program’s logic by abstracting away the junk assignments. (Essentially, win32 functions’ OpenClipboard, GetClipboardData, and SetClipboardData are used to retrieve, check, and replace copy-pasted Ascii and Unicode text, if they match the following pattern “/0x(..){20}/”. The replacement string target wallet address, previously decrypted by sub_401000.)

Cleaning the Intermediate Representation

Recall that the native analysis pipeline can be simplifed as the workflow below:

CodeObject (*)
-> Reconstructed Routines & Data
-> Conversion to IR (low-level, non-optimized)
-> IR Optimizations <--- this is where we'll work
-> Final IR (higher-level, optimized, typed)
-> Generation of AST
-> AST Optimizations
-> Final AST (final, cleaned)
-> High-level output (eg, C variant)

Our custom IR optimizer will look for junk assignments and remove them. The important criteria are: What is the junk array start and end addresses? Is it common to all routines in the binary, or is there one array per routine? Those questions may be hard to answer in the general case. However, for our specific sample file, we can assert with a high-degree of certainty that the junk array:
– starts at address 0x415882
– is at most 256 bytes long
– is used solely by sub_401171, the routine we want to analyze

Because of the above restrictions, the IR optimizer we are going to write should be qualified as a custom or ad-hoc IR optimizer. Chances are, we won’t be able to reuse it as-is in other programs without some amount of tweaking.

Let’s get started, we will:
– create an Eclipse project with scaffold code for a Java back-end plugin
– write and test a custom IR optimizer with a headless client
– deploy the plugin and make it usable and accessible from the UI desktop client

Creating a Plugin Project

Before we proceed, make sure to:

  • Define an environment variable JEB_HOME, that points to your JEB installation folder
  • Install Eclipse IDE

Then:

  • Clone the jeb-native-ir-optimizer-example1 repository.
  • Create an Eclipse project by running:
    • On Windows: create-eclipse-project.cmd
    • On Linux/macOS: create-eclipse-project.sh
  • Open Eclipse and import the newly-created project into your Workspace (File, Import, Existing Projects into the Workspace, select the cloned repository folder, proceed)
Importing an existing project into Eclipse

Debugging the Obfuscation

Now that your project is imported in Eclipse, you should be able to see two source files in src’s default package:

  • Tester.java
  • EOptExample1.java

EOptExample1 is the IR optimizer plugin we will be working on. (Note that several classes of plugins exist, this one is a native IR optimizer, and therefore inherits from AbstractEOptimizer or one its subclasses.)

Tester creates a headless JEB instance that loads the plugin EOptExample1.

Package Explorer view of the newly-created project

Tester.java does the following:

  • Create a JEB instance 1
  • Load the test plugin EOptExample1
  • Then, create a JEB project and load the artifact file samples/1.exe (IMPORTANT: unzip 1.zip to 1.exe first – password: password)
  • Analyze the artifact
  • Retrieve a handle on the native decompiler
  • Retrieve a handle on the to-be-analyzed routine sub_401171
  • Perform a full decompilation of that routine

Let’s have a preliminary look at EOptExample1: This IR optimizer type is set to STANDARD, which is not ideal when you use custom optimizers tailored for specific code. A better IR optimizer class for those is ON_DEMAND: those optimizers are to be manually invoked, e.g. from JEB UI (menu: File, Advanced Unit Options). However, during development, since we are focusing on a particular file and routine, STANDARD type may be fine. Standard optimizers are called during regular IR optimization phases of the decompilation pipeline.

public class EOptExample1 extends AbstractEOptimizer {

    public EOptExample1() {
        super(DataChainsUpdatePolicy.UPDATE_IF_OPTIMIZED);
        getPluginInformation().setName("Sample IR Optimizer #1");
        getPluginInformation().setDescription("Remove IR-statements reduced to \"*(&garbage + delta) = xxx\"");
        getPluginInformation().setVersion(Version.create(1, 0, 0));
        // Standard optimizers are normally run, as part of the IR optimization stages in the decompilation pipeline
        setType(OptimizerType.STANDARD);
    }

    // replace all IR statements previously reduced to EMem ("[junk_address] = xxx") to ENop
    @Override
    public int perform(boolean updateDFA) {
        logger.info("IR-CFG before running custom optimizer \"%s\":\n%s", getName(),
                DecompilerUtil.formatIRCFGWithContext(2, cfg, ectx));
        // ...
        // optimizer code
    }
}

Note the plugin’s data-chains update policy, set to UPDATE_IF_OPTIMIZED. Optimizations that specify this flag tell their runner, aka the master optimizer that orchestrate them, that identifiers may be modified – hence, if optimizations occurred, a data flow analysis (DFA) pass needs to take place again. DFA update policies are a topic for another article.

Lines 3-5 are plugin metadata information, such as name and description, authorship, version numbers (including minimum/maximum JEB back-end versions), etc.

Before we deep-dive into perform(), let’s first set a breakpoint on line 15, where logger.info(…) is called. Then, start a debugging session for Tester: menu Run, command Debug (hotkey: F11.)

After a few seconds of analysis, your breakpoint should be hit; it corresponds to the first-time invocation of your custom optimizer. The logger prints out the IR-CFG that’s about to be optimized. Let’s have a look at it:

IR-CFG before running custom optimizer "Sample IR Optimizer #1":
>> IN(@0): ecx={@D} esp={@0} ebp={@1} ss={@1,@C,@18,@1D,@21,@24,@25,@27,@30,@35,@38,@3B,@3E,@3F,@41,@43,@46,@4F,@51,@54,@56,@59,@5C,@5D,@5F,@6B,@77,@81,@84,@9B,@9E,@A0,@AC,@B8,@BA,@BD,@BF,@C3,@C5,@C7,@CB,@CD,@D1,@D2,@D4,@E0,@E9,@EF,@F1,@F5,@F7,@FB,@FC,@FE,@100,@103,@106,@107,@109,@10C,@10E,@112,@114,@116,@11A,@11C,@11F,@122,@123,@12E,@131,@133,@137,@139,@13D,@13E,@140,@143,@145,@149,@14B,@14F,@150,@152,@15D,@173,@176,@179,@17C,@17D,@17F,@181,@18A,@18C,@18F,@191,@194,@196,@19A,@19D,@19E,@1A0,@1B3,@1BF,@1C2,@1D9,@1DC,@1E0,@1EC,@1EF,@1F1,@1F3,@1F7,@1F9,@1FC,@1FF,@202,@203,@205,@211,@21D,@220,@222,@226,@227,@229,@22B,@22E,@231,@232,@234,@237,@23A,@23C,@23E,@242,@244,@246,@24A,@24C,@250,@251,@25C,@25F,@262,@263,@265,@268,@26A,@26E,@271,@272,@27A,@27F,@295,@298,@29A,@29D,@2A4,@2A9,@2AD,@2B0,@2B2,@2B6,@2B7,@2BA,@2BC,@2C0,@2C2,@2C6,@2C7,@2CA,@2CD,@2D2,@2DE,@2E4,@2E7,@2E8,@2EA} ds={@F,@11,@19,@1E,@22,@28,@31,@36,@39,@3C,@40,@44,@4C,@4E,@52,@55,@57,@5A,@60,@6C,@74,@78,@7B,@82,@85,@86,@87,@8E,@90,@92,@9C,@A1,@A3,@A4,@AD,@B5,@B7,@BB,@C0,@C4,@C8,@CE,@D5,@E1,@EA,@ED,@F2,@F8,@FD,@FF,@101,@104,@10A,@10F,@113,@117,@11B,@11D,@120,@124,@12F,@134,@13A,@13F,@141,@146,@14C,@153,@15A,@15C,@163,@165,@166,@167,@170,@171,@174,@177,@17A,@17E,@180,@187,@189,@18D,@192,@197,@19B,@1A1,@1AB,@1B4,@1B7,@1B9,@1C0,@1C3,@1C4,@1C5,@1CC,@1CE,@1D0,@1DA,@1DD,@1DE,@1E1,@1E9,@1ED,@1F0,@1F4,@1FA,@1FD,@200,@206,@212,@219,@21B,@21E,@223,@228,@22A,@22C,@22F,@235,@238,@23B,@23F,@243,@247,@24D,@252,@25B,@25D,@260,@264,@266,@26B,@26F,@273,@27B,@27E,@285,@287,@288,@289,@292,@293,@296,@29B,@2A5,@2AA,@2AE,@2B3,@2B8,@2BD,@2C3,@2C8,@2CE,@2D0,@2D3,@2DF,@2E1,@2E2,@2E5,@2EB} OpenClipboard={@25} GetClipboardData={@3F,@17D} GlobalAlloc={@FC,@227} GlobalLock={@107,@232} GlobalUnlock={@13E,@263} SetClipboardData={@150,@272,@2B7,@2C7} CloseClipboard={@2CB} Sleep={@2E8} sub_401000={@D} sub_405010={@5D} sub_404F80={@D2} sub_4024E0={@123,@251} sub_404E54={@19E} sub_404E14={@203} 
0000/1>  s32:_esp = (s32:_esp - i32:00000004h)                                                                 DU: esp={@1,@2,@B}                 | UD: esp={} 
0001/1:  32<s16:_ss>[s32:_esp] = s32:_ebp                                                                      DU:                                | UD: esp={@0} ebp={} ss={} 
0002/9:  s32:_ebp = s32:_esp                                                                                   DU: ebp={@38,@41,@46,@4F,@54,@56,@84,@9E,@B8,@BD,@C5,@FE,@100,@10C,@114,@11C,@131,@140,@15D,@176,@17F,@181,@18A,@18F,@194,@1C2,@1DC,@1EF,@1F1,@1FC,@229,@22B,@237,@23C,@244,@25C,@265,@27F,@298,@29D}  | UD: esp={@0} 
000B/1:  s32:_esp = (s32:_esp - i32:0000002Ch)                                                                 DU: esp={@C,@D,@17}                | UD: esp={@0} 
000C/1:  32<s16:_ss>[s32:_esp] = i32:0040117Ch                                                                 DU:                                | UD: esp={@B} ss={} 
000D/1:  call s32:_sub_401000(s32:_ecx)->(s32:_eax){32[s32:_esp]}                                              DU: eax={}                         | UD: ecx={} esp={@B} sub_401000={} 
000E/1+  s32:_edi = i32:00415882h                                                                              DU: edi={}                         | UD: 
000F/1:  32<s16:_ds>[i32:00415944h] = i32:E2E60682h                                                            DU:                                | UD: ds={} 
0010/1:  s32:_eax = i32:00000001h                                                                              DU: eax={}                         | UD: 
0011/6:  32<s16:_ds>[i32:00415904h] = i32:7C64C0E4h                                                            DU:                                | UD: ds={} 
0017/1:  s32:_esp = (s32:_esp - i32:00000004h)                                                                 DU: esp={@18,@1A}                  | UD: esp={@B,@2EC} 
0018/1:  32<s16:_ss>[s32:_esp] = i32:E87A1612h                                                                 DU:                                | UD: esp={@17} ss={} 
0019/1:  32<s16:_ds>[i32:004158DDh] = i32:E87A1612h                                                            DU:                                | UD: ds={} 
001A/1:  s32:_esp = (s32:_esp + i32:00000004h)                                                                 DU: esp={@1C}                      | UD: esp={@17} 
001B/1:  nop                                                                                                   DU:                                | UD: 
001C/1+  s32:_esp = (s32:_esp - i32:00000004h)                                                                 DU: esp={@1D,@20}                  | UD: esp={@1A} 
001D/1:  32<s16:_ss>[s32:_esp] = i32:CCA4A4A0h                                                                 DU:                                | UD: esp={@1C} ss={} 
001E/2:  32<s16:_ds>[i32:004158CAh] = i32:CCA4A4A0h                                                            DU:                                | UD: ds={} 
0020/1:  s32:_esp = s32:_esp                                                                                   DU: esp={@21,@23}                  | UD: esp={@1C} 
0021/1:  32<s16:_ss>[s32:_esp] = i32:00000000h                                                                 DU:                                | UD: esp={@20} ss={} 
0022/1:  32<s16:_ds>[i32:00415951h] = i32:249E4228h                                                            DU:                                | UD: ds={} 
0023/1:  s32:_esp = (s32:_esp - i32:00000004h)                                                                 DU: esp={@24,@25,@26}              | UD: esp={@20} 
0024/1:  32<s16:_ss>[s32:_esp] = i32:004011CAh                                                                 DU:                                | UD: esp={@23} ss={} 
0025/1:  call s32:_OpenClipboard(32<s16:_ss>[(s32:_esp + i32:00000004h)])->(s32:_eax){32[s32:_esp]}            DU: eax={@33}                      | UD: esp={@23} ss={} OpenClipboard={} 
...
... (trimmed)
...

The above IR listing is a human-friendly representation of IR statements. The general format of this listing is:

cnt   what
1 >> IN(@EntryOffset){live_inputs}
1* offset/lengthC <insn> | DU:<def-use-chains> UD:<use-def-chains>
0+ << OUT(@ExitOffset){reaching_outputs}


- offset: IR statement offset
- length: IR statement length (generally, 1)
- C: indicates whether the instruction is
- the entry-point instruction (>)
- the first of a basic-block (+)
- any other instruction (:)
- insn: IR statement instruction (refer to Part 1 of this blog series)
- DU/UD: routine def-use and use-def chains
- IN: live input variables at the entry-point
- OUT: reaching output variables at a given exit point
We breakpoint’ed on logger.info(), and single-stepped one line. The output can be seen in the console view. It may be better (depending on how large your console buffer is) to examine the full output dumped to jeb-plugin-tester.log in your Temp folder.

The IR listing is relatively readable, although quite verbose at this early stage of optimization (roughly, the first pass in tier 1 of the analysis pipeline). The important idioms to look at here are:

Preliminary conversion of low-level junk inserts

a/ The first one is an Assign(Mem(Imm), Imm), which corresponds to optimized “mov [edi + offset], value”, where the value of edi was determined, propagated further, and the addition folded and converted to an immediate address.

b/ The second one is a partially optimized “push value / pop [address]”. Later optimizations phases will find and remove esp updates or esp-based operations, as was shown in the pseudo-code earlier. What we need to focus on here is the Assign(Mem(Imm), Imm), like the one in a/.

Those are the bits we will look for and modify: Assuming those assignments are useless, we will simply replace them by Nop statements.

Writing the Optimizer

At this point, our preliminary understanding of the obfuscation is enough to start writing the clean-up optimizer. Its code is extremely simple, for two main reasons:
– The obfuscation scheme itself is relatively trivial
– Other built-in JEB optimizers are giving us clean IR assignments to work on

Let’s look at the code of proceed():

    @Override
    public int perform(boolean updateDFA) {
        final long garbageStart = 0x415882;
        final long garbageEnd = garbageStart + 0x100;        
        int cnt = 0;
        for(int iblk = 0; iblk < cfg.size(); iblk++) {
            BasicBlock<IEStatement> b = cfg.get(iblk);
            for(int i = 0; i < b.size(); i++) {
                IEStatement stm = b.get(i);
                if(!(stm instanceof IEAssign)) {
                    continue;
                }
                IEAssign asg = (IEAssign)stm;
                if(!(asg.getLeftOperand() instanceof IEMem)) {
                    continue;
                }
                IEMem target = (IEMem)asg.getLeftOperand();
                if(!(target.getReference() instanceof IEImm)) {
                    continue;
                };
                IEImm wraddr = (IEImm)target.getReference();
                if(!wraddr.canReadAsAddress()) {
                    continue;
                }
                long addr = wraddr.getValueAsAddress();
                if(addr < garbageStart || addr >= garbageEnd) {
                    continue;
                }
                b.set(i, ectx.createNop(stm));
                cnt++;
            }
        }
        return postPerform(updateDFA, cnt);
    }

This optimizer inherits from AbstractEOptimizer. Therefore, the perform() method works on an IR-CFG. (Not all optimizers may choose to do so; it is sometimes easier to work directly on statements or expressions.)

process() goes through all statements or every basic block of the IR-CFG. Using the instanceof operator, we check that the statement is an assignment such as: Mem(address) = Imm. The address is retrieved, and we make sure that it falls within the junk array. If those checks succeed, we replace the assignment by a Nop.

And that is it. Clean and simple – although, not quite portable, since the junk array address and size are hard-coded into the code! But that is not the point of this blog, and neither is portability a first-class goal when writing optimizers for custom code.

Next up, let’s see how to use the plugin in an interactive session using the desktop client.

Building, Deploying, Interactive Use

In order to use the optimizer within the JEB desktop client, we either:

  • Register the plugin as a development plugin;
  • Or build the plugin as a Jar and drop it in JEB’s coreplugins/ folder.

Development Plugin

This is the easiest option. You may consider it as an intermediate step between prototyping with the headless client, as demonstrated above, and a full-blown, deployed Jar plugin.

Open the Options panel, Development tab, tick the option “Development Mode”, add the bin/ folder of your plugin’s project to the classpath, and add the classname of your plugin entry-point:

Setting up a development plugin in JEB UI

Press OK and restart JEB. Your plugin will be loaded and ready to use. You may now skip to the section “Using the IR optimizer plugin”.

Building a Jar plugin

The alternative is to run build.cmd (on Windows) or build.sh (on Linux/macOs), which calls an Ant script in the scripts/ folder, therefore, make sure to have Ant installed on your system first. You may also customize the plugin name and version before building.

The resulting Jar plugin file will be generated in your project’s out/ folder. Copy it to your JEB coreplugins/ folder and start the JEB client. Your plugin will be automatically loaded, along with the other plugins.

Using the IR Optimizer Plugin

If your plugin has the type STANDARD (default), then, as explained earlier, it will be invoked by the optimizations’ orchestrator automatically, at various times during the decompilation pipeline. If that’s the mode you’d like to choose, make sure that your plugin is generic enough to handle all types of input routines, else you’re in for some strange surprises if you ever forget to remove it from your coreplugins/ folder.

An alternative is to convert it to an on-demand plugin:

public EOptExample1() {
        super(DataChainsUpdatePolicy.UPDATE_IF_OPTIMIZED);
        getPluginInformation().setName("Sample IR Optimizer #1");
        getPluginInformation().setDescription("Remove IR-statements reduced to \"*(&garbage + delta) = xxx\"");
        getPluginInformation().setVersion(Version.create(1, 0, 0));

        // Standard optimizers are normally run, as part of the IR optimization stages in the decompilation pipeline
        //setType(OptimizerType.STANDARD);

        // alternative (better for production / in UI use):
        setType(OptimizerType.ON_DEMAND);
        setPreferredExecutionStage(-NativeDecompilationStage.LIFTING_COMPLETED.getId());
        setPostProcessingActionFlags(PPA_OPTIMIZATION_PASS_FULL);
    }

– Line 11 makes the optimizer on-demand. Users must manually activate it, on specific code.
– Line 12 is recommended for on-demand optimizers: we specify at which point in in the pipeline the plugin should be called.
– Finally, we set some post-processing flags, specifying that a full round of standard optimizations must be performed after our custom optimizer has run: this will allow cleaning up code remnants, and optimize our IR-CFG further – something made possible after running an optimization pass like this one.

On-demand optimizer plugins show up in the File, Advanced Unit Options dialog box, that you may bring up when a decompiled routine has the focus:

List of on-demand optimizers managed by a given decompiler instance

Tick the optimizer box, press OK. The routine will be re-decompiled.

Clean Code

Regardless which method you choose, once cleaned up, the IR will allow for better downstream pipeline phases, including typing, AST generation, AST optimizations, etc.

The pseudo-C code has become quite readable:

The same decompiled method, after deobfuscation by the custom plugin.

Conclusion

That is it for part 2. We scratched the surface of IR optimizers (which themselves are a relatively small – albeit important – part of the overall decompilation pipeline 2) but it’s a good start. I strongly encourage you to experiment and ask your questions on our Slack channel. One ongoing effort right now is to bring the API documentation up to speed in terms of contents and sample code.

In part 3, we will continue exploring IR optimizers. Later on in the series, we will show how to write AST optimizers 3, how to write decompilation modules, and show how existing decompilers can be cutomized further. Stay tuned!

  1. JEB must have been previously run, at least once: EULA accepted, license key generated, etc.
  2. The decompilation pipeline is one component of the native analysis pipeline, which is one module, among tens, of the JEB back-end: the public API is worth exploring if you’re into advanced use cases.
  3. AST generation is one of the very final decompilation phases – working on the syntax tree serves different purposes than working on the IR

JEB Native Analysis Pipeline – Part 1: Intermediate Representation

JEB native code analysis components make use of a custom intermediate representation (IR) to perform code analysis.

Some background: after analysis of a code object, the native assembly of a reconstructed routine is converted to an intermediate representation. 1 That IR subsequently goes through a series of transformation passes, including massages and optimizations. Final stages include the generation of high-level C-like code. Most stages in this pipeline can be customized by users via the use of plugins. A high-level, simplified view of the pipeline could be as follows:

CodeObject (*)
-> Reconstructed Routines & Data
-> Conversion to IR (low-level, non-optimized)
-> IR Optimizations
-> Final IR (higher-level, optimized, typed)
-> Generation of AST
-> AST Optimizations
-> Final AST (final, cleaned)
-> High-level output (eg, C variant)

(*) Examples of code objects: a Windows PE file with x86-code, an ELF library with with MIPS code, a headless ARM firmware, a Wasm binary file, an Ethereum smart contract, etc.

Two important JEB API components to hook into and customize the native analysis pipeline are:
– The IR classes
– The AST classes
We will start looking at IR components through the rest of this part 1.

IR Description

JEB IR can be seen as a low-level, imperative assembly language, made of expressions. Highest-level expressions are statements. Statements contain expressions. Generally, expressions can contain expressions. IR can be accessed via interfaces in the JEB API. The top-level interface for all IR expressions is IEGeneric. All IR elements start with IExxx. 2

The diagram below shows the current hierarchy of IR expression interfaces:

Note that IEGeneric sits at the top. All other IRE’s (short for IR Expressions from now on) derive from it. Let’s go through those interfaces:

  • IEImm: Integer immediate of arbitrary length. Eg,
    Imm(0x1122, 64) would represent the 64-bit integer value 0x1122.
  • IEVar: Generic IRE to represent variables. Variables can represent underlying physical registers, virtual registers, local function variables, global program variables, etc.
  • IEMem: Piece of memory of arbitrary length. The memory address itself is an IRE; the accessed bitsize is not.
  • IECond: A ternary expression “c ? a: b”, where a, b and c are IRE’s.
  • IERange: A fixed integer range, commonly used with Slice
  • IESlice: A chunk (contents range) of an existing IR. Eg, Slice(Imm(0x11223344, 32), 16, 24)) can be simplified to Imm(0x22, 8)
  • IECompose: The concatenation of two or more IRE’s (IR0, IR1, …), resulting in an IR of size SUM(i=0->n, bitsize(IRi))
  • IEOperation: A generic operation expression, with IRE operands and an operator. Eg, Operation(ADD,Imm(0x10,8),Mem(Imm(0x10000,32),8)). Most standard operators are supported, as well as less standard operators such as the Parity function or Carry function.)
  • IEStatement: the super-interface for IR statements; we will detail them below.

An IR translation unit, resulting from the conversion of a native routine, consists of a sequential list of IEStatement objects. An IR statement has a size (generally, but not necessarily, 1) and an address (generally, a 0-based offset relative to its position in the translation unit).

As of JEB 3.0.8, IR statements can be:

  • IEAssign: The most common of all statements: an assignment from a right-side source to a left-side destination. While the source can be virtually anything, the destination IRE is restricted to a subset of expressions.
  • IENop: This statement does nothing but consumes virtual size in the translation unit.
  • IEJump: An unconditional or conditional jump within the translation unit, expressed using IR offsets.
  • IEJumpFar: An unconditional or conditional far jump (can be outside the translation unit), expressed using native addresses.
  • IESwitch: The N-branch equivalent of IEJump.
  • IECall: Represent a well-formed static or dynamic dispatch to another IR translation unit. The dispatch expression can be any IRE (eg, an Imm for a static dispatch; a Var or Mem for a dynamic dispatch).
  • IEReturn: A high-level expression used to denote a return-to-caller from a translation unit representing a routine. This IRE is always introduced by later optimization passes.
  • IEUntranslatedInstruction: This powerful statement can be used to express anything. It is generally used to represent native instructions that cannot be readily translated using other IR expressions. (Users may see it as an IECall on steroid, using native addresses. In that sense, it is to IECall what IEJumpFar is to IEJump.)

Now, let’s look at a few examples of conversions.

IR Examples

Let’s assume the following EVars were previously defined by an Intel x86 (or x86-64) converter: tmp (a 32-bit EVar representing a virtual placeholder register); eax (an EVar representing the physical register %eax); ?f (1-bit EVars representing standard x86 flags).

  • x86: mov eax, 1
s32:_eax = s32:00000001h

Translating this mov instruction is straight-forward, and can be done with a single Assign IR statement.

  • x86-64: not r9d
s64:_r9 = C(~(s64:_r9[0:32[), i32:00000000h)

Translating a not-32-bit-register on an x86-64 platform is slightly more complex, as the upper 32-bit of the register are zeroed out. Here, the converter is making use of three nested IREs: (IECompose(IEOperation(NOT, Slice(r9, 0, 32))))

Reading IR. IECompose are pretty-printed as C(lo, …, hi), IESlice as Expr[m:n[ 

  • x86-64: xor rax, qword ds:[ecx+1]
0000 : s64:_rax = (s64:_rax ^ 64<s16:_ds>[(s64:_rcx[0:32[ + i32:00000001h)])
0001 : s1:_zf = (s64:_rax ? i1:0 : i1:1)
0002 : s1:_sf = s64:_rax[63:64[
0003 : s1:_pf = PARITY(s64:_rax[0:8[)
0004 : s1:_of = i1:0
0005 : s1:_cf = i1:0

One side-effect of arithmetic operations on x86 is the modification of flag registers. A converter explicits those side effects. Consequently, translating the exclusive-or above resulted in several Assign IR statements to represent register and flags updates. 3

Reading IR. IEMem are pretty-printed as bitsize<SegmentIR>[AddressIR]

  • x86: add eax, 2
0000 : s32:_tmp = s32:_eax
0001 : s32:_eax = (s32:_eax + i32:00000002h)
0002 : s1:_zf = (s32:_eax ? i1:0 : i1:1)
0003 : s1:_sf = s32:_eax[31:32[
0004 : s1:_pf = PARITY(s32:_eax[0:8[)
0005 : s1:_af = ((s32:_tmp ^ i32:00000002h) ^ s32:_eax)[4:5[
0006 : s1:_cf = (s32:_tmp CARRY i32:00000002h)
0007 : s1:_of = ((s32:_tmp ^ s32:_eax) & ~((s32:_tmp ^ i32:00000002h)))[31:32[

The translation of add makes use of the temporary, virtual EVar tmp. It holds the original value of %eax, before the addition was done. That value is necessary for some flag update computations (eg, the overflow flag.) Also take note of the use of special operators Parity and Carry in the converted stub.

  • x86-64: @100000h: jz $+1
s64:_rip = (s1:_zf ? i64:0000001000000003h : i64:0000001000000002h)

Note that a native address is written to the RIP-IEVar (or any EVar representing the Program Counter – PC). PC-assignments like those can later be optimized to IEJump, making use of IR Offsets instead of Native Addresses.

Also note that the Control Flow Graph (CFG) of the native instruction in the examples thus far are isomorphic to their IR-CFG translated counterparts. That is not always the case, as seen in the example below.

  • x86: repe cmpsb
0000 : if (s32:_ecx == i32:00000000h) goto 000B
0001 : s1:_zf = ((8<s16:_ds>[s32:_esi] - 8<s16:_es>[s32:_edi]) ? i1:0 : i1:1)
0002 : s1:_sf = (8<s16:_ds>[s32:_esi] - 8<s16:_es>[s32:_edi])[7:8[
0003 : s1:_pf = PARITY((8<s16:_ds>[s32:_esi] - 8<s16:_es>[s32:_edi]))
0004 : s1:_cf = (8<s16:_ds>[s32:_esi] <u 8<s16:_es>[s32:_edi])
0005 : s1:_of = ((8<s16:_ds>[s32:_esi] ^ (8<s16:_ds>[s32:_esi] - 8<s16:_es>
       [s32:_edi])) & (8<s16:_ds>[s32:_esi] ^ 8<s16:_es>[s32:_edi]))[7:8[
0006 : s1:_af = ((8<s16:_ds>[s32:_esi] ^ 8<s16:_es>[s32:_edi]) ^ (8<s16:_ds>
       [s32:_esi] - 8<s16:_es>[s32:_edi]))[4:5[
0007 : s32:_esi = (s32:_esi + (s1:_df ? i32:FFFFFFFFh : i32:00000001h))
0008 : s32:_edi = (s32:_edi + (s1:_df ? i32:FFFFFFFFh : i32:00000001h))
0009 : s32:_ecx = (s32:_ecx - i32:00000001h)
000A : if s1:_zf goto 0000

Reading IR. conditional IEJump are pretty-printed “if (cond) goto IROffset”. Unconditional IEJump are rendered as simple “goto IROffset”.

This IR-CFG is not isomorphic to the native CFG. Additional edges (per the presence of 2x IEJump) are used to represent the compare “[esi+xxx] to [edi+xxx]” loop.

Accessing IR

The JEB back-end API allows full access to several IR-CFG’s, from low-level, raw IR to partially optimized IR, to fully lifted IR just before AST generation phases.

Navigating the IR in the GUI

The UI client currently provides access to the most optimized IR of routines. Those IR-CFG’s can be examined in the apt-named fragment right next to the source fragment showing decompiled code. Here is an example of a side-by-side assemblies (x86, IR). The next screenshot shows the decompiled source.

Left-side: x86 routine / Right-side: optimized IR of the converted routine
(Click to enlarge)
Decompiled source

IR via API

The API is the preferred method when it comes to power-users wanting to manipulate the IR for specific needs, such as writing a custom optimizer, as we will see in the next blog in this series.

Reminder: JEB back-end plugins can be written in Java (preferably) or Python. JEB front-end scripts can be written in Python, and can run both in headless clients (eg, using the built-in command line client) or the UI client.

For now, let’s see how to write a Python script to:

  • Retrieve a decompiled routine
  • Get the generated Intermediate Representations
  • Print it out

The following script does retrieve the first internal routine of a Native unit, decompiles it, retrieve the default (latest) IR, and prints out its CFG. The full scripts is available on GitHub.

# retrieve `unit`, the code unit

# GlobalAnalysis is assumed to be on (default)
decomp = DecompilerHelper.getDecompiler(unit)
if not decomp:
  print('No decompiler unit found')
  return

# retrieve a handle on the method we wish to examine
method = unit.getInternalMethods().get(0)#('sub_1001929')
src = decomp.decompile(method.getName(True))
if not src:
  print('Routine was not decompiled')
  return
print(src)
    
decompTargets = src.getDecompilationTargets()
print(decompTargets)

decompTarget = decompTargets.get(0)
ircfg = decompTarget.getContext().getCfg()
# CFG object reference
# see package com.pnfsoftware.jeb.core.units.code.asm.cfg
print("+++ IR-CFG for %s +++" % method)
print(ircfg.formatSimple())

Running on Desktop Client. Run this script in the UI client via File, Scripts, Run… (hotkey: F3). Remember to open a binary file first, with a version of JEB that ships with the decompiler for that file’s architecture.

Running on the command-line. You may also decide to run it on the command-line. Example, on Windows:

$ jeb_wincon.bat -c --srv2 --script=PrintNativeRoutineIR.py -- winxp32bit/notepad.exe

Example output:

... <trimmed>
...
+++ IR-CFG for Method{sub_1001929}@1001929h +++
0000/1>  s32:_$eax = 32<s16:_$ds>[s32:_gvar_100A4A8]
0001/1:  if !(s32:_$eax) goto 0003
0002/1+  call s32:_GlobalFree(s32:_$eax)->(s32:_$eax){i32:0100193Ch}
0003/1+  s32:_$eax = 32<s16:_$ds>[s32:_gvar_100A4AC]
0004/1:  if !(s32:_$eax) goto 0006
0005/1+  call s32:_GlobalFree(s32:_$eax)->(s32:_$eax){i32:01001948h}
0006/1+  32<s16:_$ds>[s32:_gvar_100A4A8] = i32:00000000h
0007/1:  32<s16:_$ds>[s32:_gvar_100A4AC] = i32:00000000h
0008/1:  return s32:_$eax

Conclusion

That is it for part 1. In part 2, we will continue our exploration of the IR and see how we can hook into the decompilation pipeline to write our custom optimizers to clean packer-specific obfuscation, as well as make use of the data flow analysis components available with the IR-CFG. Stay tuned!

  1. Working on IR presents several advantages, two of which being: a/ the reduction of coupling between the analysis pipeline and the input native architecture; b/ and offering a side-effect free representation of a program.
  2. The design choices of JEB IR are out-of-scope for this blog. They may be the subject of a separate document.
  3. When decompiling routines, IR optimization passes will iteratively refactor and clean-up unnecessary operations. In practice, most flag assignments will end up being removed or consolidated.