Skip to content

Android Analysis

Previous material

We recommend the reader to be familiar with the platform-agnostic sections Actions, Views, and Decompiling before proceeding with Android-specific contents.

JEB is a well-known industry tool used to reverse-engineer and audit Android applications. You will be able to:

  • Analyze APK files and their contents, including DEX files, Certificates, Resources, Assets, Native Library code, etc.
  • Examine encoded resource files and manifests, including resources with obfuscated names and locations.
  • Examine app certificates (legacy, v2, and v3)
  • Decompile DEX bytecode, with full-support for multi-DEX reconstruction.
  • Analyze native library code: more on the Native code analysis section
  • Debug Android applications (Dalvik and Native - x86, arm, mips - code) and transition seamlessly from Dalvik to Native, and vice-versa.
  • Write your own extensions using the API (client scripts in Python, back-end plugins in Java).

Plugins#

The plugins used to analyze Android apps consist of:

  • The APK plugin is responsible for processing APK files. Encoded resources (arsc) are decoded by this plugin. Other jobs, such as analyzing dex files, analyzing certificates, processing asset files, analyzing binary files, etc. are delegated to appropriate plugins.
  • The DEX plugins: DEX analyzer (DEX parsing and merging, Dalvik disassembling, etc.), DEX decompiler, DEX debuggers.
  • Native code analyzers: disassemblers, decompilers, etc.
  • A handful of other plugins, such as Certificate parsers, XML/HTML/JSON/etc. parsers.

This section mostly focuses on the APK plugin and the DEX analyzer. Other plugins are documented in separate sections of this manual.

Technical Blogs#

Our blog is filled with technical posts that will help you make the most of JEB. It is the ideal companion to this manual. Link: All PNF Software blog posts tagged Android.

API Levels#

API levels are regularly mentioned throughout this document. Here is a list of "recent" Android versions, their corresponding API levels, as well as notable changes regarding security.

Codename Version API level Date New security features
R (DP3) 11 30 2020 Privacy updates, APK Signature Scheme v4
Q 10 29 2019 Permissions for privacy, BiometricPrompt
Pie 9 28 2018 ART: Vdex with Cdex, AS-FBE, biometric API, lockdown mode, APK Signature Scheme v3
Oreo 8, 8.1 26, 27 2017 ART: OAT with Vdex (oat w/o dex, separate vdex with dex'es), Google Play Protect
Nougat 7, 7.1 24, 25 2016 APK Signature Scheme v2, File Based Encryption (FBE - and consequently, DirectBoot), AS-FDE, Android Things
Marshmallow 6 23 2015 Adoptable Storage (AS), granular permissions and permission levels (NORMAL, DANGEROUS), Doze & App Standby, Android Wear
Lollipop 5, 5.1 21, 22 2014 ART: OAT with dex'es in .rodata, 64-bit support (x86_64, arm64-v8a), Android Auto
KitKat 4.4, 4.4W 19, 20 2013 ART (optional), VerifiedBoot, Full Disk Encryption (FDE)
Jelly Bean 4.1, 4.2, 4.3 4.4, 4.4W 2012 SELinux introduction, multi-users

Android Versions Reference


APK Structure#

This section is a short primer on the Android Package file (APK). It is assumed that the reader is already familiar with the structure of an APK. Good introduction material can be found on the official Android developer portal.

An Android app is a zip file containing application code, data, and metadata.

  • Code: Dalvik (bytecode), Native (*.so libs)
  • Data: resources (structured), assets (unstructured)
  • Metadata= manifest (what), certificates (who)

When JEB processes an APK, the resulting structure in the Project tree will differ from a raw ZIP tree view. The range of differences goes from slightly for regular apps, to significantly for obfuscated or complex apps.

The picture below shows a side-by-side comparison of processing an app as a ZIP file vs processing it as an APK:

  • Manifest: encoded VS decoded
  • Certificates: v1 (visible, in file MANIFEST.MF), v2/v3 (in ZIP) VS parsed certificates
  • Bytecode: spread over classes.dex, classes1.dex, ..., classesN.dex VS virtual merged DEX unit
  • Resources: encoded and scattered in resources.arsc, res/, elsewhere (anywhere) VS decoded and reorganized resources
  • Native libs: 1-to-1
  • Assets: 1-to-1

Certificates#

As of Android 11, four types of signatures are in place to sign APKs, versions 1 through 4.

Version 1#

Version 1 is the legacy scheme supported by all versions of Android:

  • standard Jar signing (Oracle)
  • signing data goes in META-INF/
  • each individual file in the archive is signed
    • MANIFEST.MF: list of hashes of all files
    • xxx.SF: hash of hash entries in MANIFEST.MF
    • xxx.{RSA,DSA,...}: signature of xxx.SF + signer certificate (= what JEB displays)
    • note that xxx='CERT', usually
  • The apk/zip itself is not signed: this scheme is both inefficient and incomplete when the goal is to verify the APK as a whole

Versions 2/3#

Versions 2 and 3 are specific to Android:

  • What is signed is the APK as a whole
  • Uses a twist in zip format specifications
  • The global signing block is inserted just before the zip Central Directory (and can be located by looking for a magic number)
  • V3 = V2 + support for key rotation
  • What is displayed in the Certificate fragment is the signer's certificate, just like V1's
  • review the reference documentation for additional details

Version 4#

The APK signature scheme version 4 scheme is being introduced with Android 11 (R) to ease development of larger applications. The signature of the APK is done incrementally via a Merkle tree. The signing data is stored separately in an <APKNAME>.idsig file.

Note

Version 4 signatures do not seem to be designed for release purposes. At the moment, JEB does not parse idsig files.

JEB parses v1/v2/v3 signing data. The certificate is displayed as a tree in the UI client:

API

To retrieve this data programmatically: refer to IApkUnit, methods getSignatureSchemeVersionFlags and getSignatureSchemeV{2,3}Block

Manifest#

AndroidManifest.xml defines the Android application to whoever interacts with it, from building, to deployment, to execution.

Important parts of the Manifest:

  • Package name (fully qualified Java name)
  • Requirements to run the app (API level, hardware configs)
  • Permissions required by the app (not all may be granted by the system at t0)
  • Components must be declared in apps - except for Broadcast receivers, which can be registered dynamically
    • Activities (UI elements)
    • Services (background execution)
    • Broadcast Receivers (receive and process events from apps/system)
    • Content Providers (offer data to other apps/system)
  • Declares whether the app is debuggable on a production device <application android:debuggable="false|true" ...

For example, the simple manifest below...

  • declares a app named (internal package name) com.xyz.appcheck
  • requiring at least and ideally API 26 (Android P)
  • wants read+write access to storage
  • the App is debuggable
  • it declares one main activity (visible on launcher)
  • as well as one implicit broadcast receiver

Note

Manifests can be very complex and lengthy. For example, the primary Facebook app (com.facebook.katana) manifest is well over 2000 lines, mostly Activity descriptions.

About Permissions#

Permissions provide an indirect insight into what "functions" the app needs to perform. They are granted by the user at install and/or runtime:

  • Before API 23, permissions were all granted at install time. A pop-up would display which dangerous permission groups are being requested.
  • With API levels 23+, permissions are granular, and dangerous permissions are granted at run-time
    • unless the Manifest declares targetSdk<23 !
    • the user will be shown a system pop-up
    • permissions can also be revoked in settings

Permissions - whether related code requiring them is used, whether they are granted explicitly or implicitly -, MUST all be declared in the Manifest, i.e. an app cannot programmatically request a permission that was not declared in the Manifest in the first place.

Structured Resources#

Structured resources of an app consist of XML files (e.g., app layouts, strings, etc.), image files, icons, etc.

  • XML resources are encoded using a binary format called arsc. The manifest, an XML resource, is encoded as well.
  • Common resources’ information goes into the app resources.arsc file
    • Resources references resources.arsc items by id
    • They can also reference Android Framework and other vendor-installed framework resources by id (refer to the section 'Third-party Frameworks')
  • JEB always ships with the latest official Android Framework

Note

For additional information on Resources:

  • high-level information can be found in the official doc
  • lower-level details of the arsc format can be found by going through the main implementation of the encoder and decoder, on the AOSP's platform/frameworks/base repository. The newest ResourceTypes.h is located here.

Oddities and Obfuscation#

Resources on Android can be mangled in several ways. JEB unmangles them to the best of its ability. Below, we briefly describe two commonly found obfuscation techniques.

Name removal#

Resource items are normally identified by a name as well as an id. Several application protectors remove resource names from the compiled resources.arsc file when they reference well-known framework resources.

E.g., the manifest below had resource names removed. Note that most XML attribute names are missing.

<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android" :="1.1" android:versionCode="2" platformBuildVersionName="6.0-5078647" platformBuildVersionCode="23" package="com.virginoff.player">
    <uses-sdk :="7" :="23"/>
    <uses-permission :="android.permission.ACCESS_NETWORK_STATE"/>
    <uses-permission :="android.permission.SEND_SMS"/>
    <uses-permission :="android.permission.INTERNET"/>
    <uses-permission :="android.permission.WRITE_EXTERNAL_STORAGE"/>
    <uses-permission :="android.permission.WAKE_LOCK"/>
    <application :="@style/Theme.NoTitleBar.Fullscreen" :="Anal Sex Video" :="@drawable/ic_launcher" :=".Application" :="false">
        <activity :=".activity.WrapperActivity" :="true">
            <intent-filter>
                <action android:name="android.intent.action.MAIN"/>
                <category android:name="android.intent.category.LAUNCHER"/>
            </intent-filter>
        </activity>
        <activity :="o.ϟ" :="0" :="a0"/>
        <activity :="o.ȭ" :="0" :="a0"/>
        <service :="o.Ƃ"/>
    </application>
</manifest>

Dump the manifest using aapt to see the actual ids:

$ aapt dump xmltree 1.apk AndroidManifest.xml
N: android=http://schemas.android.com/apk/res/android
  E: manifest (line=0)
    A: :(0x0101021c)="1.1" (Raw: "1.1")
    A: android:versionCode(0x0101021b)=(type 0x10)0x2
    A: platformBuildVersionName="6.0-5078647" (Raw: "6.0-5078647")
    A: platformBuildVersionCode=(type 0x10)0x17
    A: package="com.virginoff.player" (Raw: "com.virginoff.player")
    E: uses-sdk (line=0)
      A: :(0x0101020c)=(type 0x10)0x7
      A: :(0x01010270)=(type 0x10)0x17
    E: uses-permission (line=0)
      A: :(0x01010003)="android.permission.ACCESS_NETWORK_STATE" (Raw: "android.permission.ACCESS_NETWORK_STATE")
...

Above, we can see that the uses-permission tag, for example, specifies the use of an attribute whose id is 0x01010003.

Attributes of an Android Manifest are well-known resources and stored as such in the Android framework. You can use aapt on the Android framework file to see them:

$ aapt dump --values resources ~/.jeb-android-frameworks/1.apk
Package Groups (1)
Package Group 0 id=0x01 packageCount=1 name=android
  Package 0 id=0x01 name=android
    type 0 configCount=1 entryCount=1543
      spec resource 0x01010000 android:attr/theme: flags=0x40000000
      spec resource 0x01010001 android:attr/label: flags=0x40000000
      spec resource 0x01010002 android:attr/icon: flags=0x40000000
 ---> spec resource 0x01010003 android:attr/name: flags=0x40000000
      spec resource 0x01010004 android:attr/manageSpaceActivity: flags=0x40000000
      spec resource 0x01010005 android:attr/allowClearUserData: flags=0x40000000
...

So, that tag could be restored to:

<uses-permission name="android.permission.ACCESS_NETWORK_STATE"/>

Note

The Android framework contains all base system resources for a given version of Android - it is located in the /system/framework/framework-res.apk (resources only) on a device, or the platforms/<APILEVEL>/android.jar in the Android SDK. JEB also drops the latest stable framework to your HOME folder's jeb-android-frameworks/1.apk

The above process is automated by JEB to restore XML files to a human-readable states.

Flattened hierarchy#

Although most structured resources (with the notable exception of the Manifest) are typically stored hierarchically under the res folder, they do not have to be. Some application protectors take advantage of this fact to flatten the resources tree, and for example, store them in the APK's root folder.

E.g., in the file below (a protected online banking app), most resource files were renamed to mangled names and stored alongside the Manifest; the res/ folder is present and contains only a handful of resources.

JEB restores both the hierarchy and names of those resource files.

Decoding problems#

Other oddities exist, they can be found in apps stretching the limits of the arsc format specifications to their boundaries.

They can be used, voluntarily or not, to thwart and crash various open-source tools. We won't detail them here, but you can find additional information here and there on our blog as well as the Apktool's GitHub issue tracker, a prime source to find weird parsing cases.

As an example, here is aapt2 (version around spring 2019) failing on a version of the Facebook app:

$ aapt2 dump Facebook_v153.0.0.54.88.apk
error: trying to add resource 'com.facebook.katana:id/(name removed)' with ID 0x7f090001 but resource already has ID 0x7f090000.

Assets#

Assets are unstructured resources. They can be of any type and stored anywhere in the APK archive. However, the assets/ directory is standard, and used by the Android AssetManager object.

Assets stored in the Resources folder res/raw are stored as-is (in particular, XML files are not encoded), and yet, are accessible in code by id, using the R class, just like any other standard resource.

The asset file below, edd.bin, is holding encrypted data

Native Code#

Android applications often contain native code, compiled as ELF library .so files. They can be located anywhere in the app. SO files can be loaded from bytecode via System.loadLibrary(simpleName) and System.load(path).

A common location for SO files is the app's lib/ folder. Libraries stored in this folder and adhering to the JNI naming convention allow the Android system to unpack appropriate SO to the device folder /data/data/<app>/lib, and make it easier for high-level code to load them, i.e. there is no need to implement the logic of figuring out which underlying platform the device is running on.

  • Location: [APK]/lib/<abi>/lib<name>.so
  • Example: high-level request System.loadLibrary("native-lib") on an aarch64 device => lib/arm64-v8a/libnative-lib.so on a Pixel phone with an Arm64 CPU

Info

For more information on native code analysis, debugging and decompilation, refer to the manual pages relative to native code.

Bytecode#

Refer to the DEX sub-section below.


Dex Bytecode#

Dex (or DEX, throughout this section), short for Dalvik Executable, is an object container for Dalvik code. It is Android's equivalent of ELF, PE, COFF, etc. containers for native code.

The primary dex file of an app is named classes.dex, located in the app's root folder. This file is present in the vast majority of apps, although it is optional: apps can be purely native.

DEX splitting#

Additional DEX files may be present: classes2.dex, classes3.dex, etc. The reason behind code splitting is a Dalvik VM legacy limitation called the "64K reference limit": many items present in a DEX file are referenced by an id stored on a 16-bit integer, e.g. it is the case for methods and fields references. To overcome the limitation, compilers such as d8 (or its predecessor dx) split the code over additional DEX files classesN.dex, where N>=2.

Additional references are created to reference definitions located in other DEX files. JEB merges all classesN.dex in a single, virtual DEX unit. Note that in practice, such DEX units could not be converted back to a single DEX file. Also keep in mind that apps may artificially split their code over multiple DEX files; the 64Krefs limit is not a hard requirement.

This app was split over 7 dex files:

Warning

On pre-API 21 (Android 5) systems, it is the responsibility of the app to load additional files - i.e., the DEX splitting mechanism is not something baked into the Dalvik VM itself. Apps can extend the support class MultiDexApplication to avoid implementing their own DEX loader, as the vast majority of the apps do. However, keep in mind that this is in no way mandatory. Malware files or protected files can implement multi-dex loading facility however they see fit.

On API 21 and above, with the advent of the new Android ART runtime, files names classesN.dex are scanned and pre-compiled along with classes.dex. However, this mechanism does not preclude apps to use additional DEX loading facility as well.

DEX execution#

There are two general types of dex files:

  • regular dex files contain generic code, use standard Dalvik instructions, meant to run on all Android devices
  • odex files, on the other hand - a generic term for "optimized DEX" - contain device specific instructions.
Optimized dex (odex)#

The DEX file(s) located in your app is not the code executed on device - except when debugging. DEX code is executed by a runtime:

  • The legacy runtime (pre-API 21 Lollipop) uses a JIT (just-in-time) compiler and generate odex files on first run.
  • The current runtime is named ART (short for Android Runtime) and makes use of AOT (ahead-of-time) compilation of Dalvik to native (x86 or ARM) at install time

Note

The format of optimized DEX files has evolved over time ("dey" magic, OAT files with DEX or DEX-like entries, VDEX and CDEX, etc.). The process of reconstructing a DEX from an optimized-DEX is systematic and implemented in several tools, such as baksmali deodex or vdexExtractor. Refer to additional references, such as Lief's notes on OAT and ART, for more information on odex.

Link: List of odex instructions.

DEX format#

This section quickly summarizes important facts about the DEX format. Refer to the official specifications for additional details:

Structure of a DEX file

The format can be linearly represented as:

  • In Header: DEX magic, DEX version: dex\n0NN\0 where NN=
    • 35: up to Android 7-
    • 37: Android 7, invoke-virtual and invoke-super accept interface methods ids (support for Java 8's default methods)
    • 38: Android 8, added invoke-polymorphic, invoke-custom, call sites and method handles entries (details)
    • 39: Android 9, added const-method-handle, const-method-type (details)
  • Tables are ordered alphabetically and do not allow duplicates
  • Note that the map section was purely redundant until DEX 38 and the introduction of method handles and call sites.

The three links above cannot be overlooked: any Android reverser should strive for Dex and Dalvik proficiency. That being said, below is a list of lesser-known or overlooked details about about Dalvik:

  • Strings are encoded using a variant of CESU-8 called MUTF-8 (modified UTF-8)
    • 1- 2- 3-byte encoding (whereas UTF-8 allows up to 4-byte)
    • Surrogates: 2x3-byte for chars \u010000 to \u10FFFF (whereas canonical representations of UTF-8 does not use surrogates; UTF-16 does)
    • \u0000 is encoded as \x00\x00 (whereas UTF-8 uses \x00)
    • Special byte \x00 indicates string end (there is no EOS concept with UTF-X)
  • Some 32-bit integers are encoded using the variable encoding scheme LEB128 and its variants
  • Types use Strings: type definition= index into string pool
  • Prototypes use Strings and Types
    • Shorty definition= index into string pool
    • Full prototype definition = list of indices into type pool
  • The Dalvik bytecode is stored in Code items
  • Call Sites and Method Handles were introduced in DEX 38
    • The DEX header remained unchanged and does not directly reference those pools; instead, they are referenced in the Map area (which largely remained unused until those items were introduced)
    • Learn more about DEX 38 on our blog

Dalvik#

Dalvik is the name of the low-level bytecode stored in DEX files. Dalvik bytecode is interpreted by a Virtual Machine (DVM).

Generation:

  • Source language: smali (low-level), Java (high-level), Kotlin (very high-level)
  • Java -> javac -> classfiles (Java bytecode) -> dx/d8 -> classes.dex (Dalvik bc)

Characteristics:

  • Register-based machine:
    • 65,536 32-bit registers, numbered v0 to v65535
    • 65,535 64-bit registers, "emulated" by using consecutive 32-bit registers [v0,v1], [v1,v2], ..., [vN,vN+1]
    • No "special" register is accessible: no flag register, no PC register, no current-frame register, etc.
    • Fixed frames (stack is N/A, no stack pointer), size declared in Code items
    • Pointer= object reference ~= fits on a single register (32-bit)
  • Regular instructions range from 2 to 10 bytes (=1 to 5 words)
    • Instruction opcode encoded on a single byte; the second byte of the first word is generally used to encode register indices
    • nop (1w), const-wide v1, 0x1122334455667788L (5w)

The generally accepted convention is to represent Dalvik disassembly in smali or a variant of smali. By default, JEB uses a variant of Smali, slightly less verbose (more readable and better suited to be displayed and manipulated in an interactive fragment).

  • Method bodies live in isolation, the concept of "jump far" (unstructured dispatch) is irrelevant. Dispatching execution to other methods is done via invoke-xxx instructions only
  • Jumps are always relative to the current PC
  • Retrieving the returned value of a function is done via a move-result-xxx instruction, located right after the invoke-xxx instruction
  • Arithmetic instructions have no side-effects / there is no flag register
  • Data in bytecode is legal:
    • Immediates: Some instructions store literals inline (i.e., within the instruction code), e.g. const-xxx
    • N-way branching instruction: switch-xxx: the jump table is stored within the bytecode
    • Small array initialization: fill-array-data-payload: array data is stored within the bytecode

Calling convention#

The DVM runs managed code and uses a no-side effect, no cleaning calling convention: every function gets a clean register slate upon execution; the parameters are stored at the bottom of the declared frame.

Registers are 32-bit wide and noted vX, 0-indexed. The alternate notation pX is used to address registers used to store input method parameters: its indexing start from frame_size - input_slot_counts.

Example 1:

  • Method: void foo(int a, char b, bool c, Object d)
  • The CodeItem declares a frame of size 5
v0
v1 <- parameter 0: p0 (a)
v2 <- parameter 1: p1 (b)
v3 <- parameter 2: p2 (c)
v4 <- parameter 3: p3 (d)
------- end of method frame
v5
v6
...
v65535

Example 2:

  • Method: void bar(double a, long b, float c)
  • The CodeItem declares a frame of size 8
v0
v1
v2
v3 <- parameter 0: p0 (a)
v4 <- parameter 1: p1 (b, lower part)
v5 <- parameter 1: p2 (b, higher part)
v6 <- parameter 2: p3 (c, lower part)
v7 <- parameter 2: p4 (c, higher part)
------- end of method frame
v8
v9
...
v65535

The default settings instruct JEB to use the pX notation when rendering parameter registers:

It can be disabled (DEX plugin option, also controlled in the UI by right-clicking, Rendering Properties, untick 'Use p for parameters')

Smali and variants#

The JEB notation is made possible because of the interactivity layer (as opposed to deadcode listing). Two notable differences:

  • For readability, the names are simple names, no longer fully-qualified
  • Invoke opcodes place the arguments after the method: invoke-xxx callsite, args instead of invoke-xxx {args}, fully_qualified_callsite

Below, the default assembly code representation used by JEB (smali variant):

Official smali code can be generated (it is useful if code needs to be exported, and later on compiled using smali.jar):


Application components#

The application and its main components (activities, broadcast receivers, services, content providers) must be declared in the Manifest, even if they are not meant to be exported (i.e., the external attribute is set to false).

Warning

There is one exception: broadcast receivers can be registered dynamically, via one of the Context.registerReceiver() methods.

The name attribute of a component is the name of the corresponding class that will be instantiated and whose methods are to be called back by the Android system (for exported components).

Example: the Manifest below declares an Activity class com.xyz.appcheck.AppCheck, among other components.

Entry points#

The exported components of an APK can be equated to the entry-points of the application. Therefore, unlike types of executable programs, an Android application can have multiple entry-points.

The true entry-point of a non-native application is the application's static initializer (Application.<clinit>). If the application being analyzed declares its own Application class in the manifest (instead of reusing android.app.Application), then that class's static initializer, if any, should be looked at first, followed by the constructors.

Similarly, the static initializers and constructors of activities, receivers, services and providers are also entry-points.

Finally, all API-defined callback methods of those five components can be called back by the system. Typically, the main activity's onCreate() method is the practical entry-point to an application, akin to a regular program's main() routine.

Note

JEB's disassembly view recaps the most important features of the APK at the top of the code listing. The important components, in particular any custom Application object, will be mentioned there:

Activities#

Activities are activated by Intent.

Exported activities (by default, any unfiltered activity is exported) are first-class entry-points. The following methods should be carefully examined:

  • constructor (although the object state is uncertain)
  • well-known callbacks, e.g. attachBaseContext (used to set up a delegate Activity), onCreate, onResume, etc.

Pseudo activities#

Be mindful of activity aliases. They are not true components, however, they can and will override their target component's characteristics, such as intent filters.

Services#

Services are activated by Intent.

Services are started by Activity code or Receiver code (after receiving a particular event). Most won't be easily started by the user, except for a few of them, such as Input services. Therefore, they should not be considered first-class entry-points from the point of view of code analysis.

Broadcast Receivers#

Broadcast receivers are activated by Intent.

The intent handler for ACTION_BOOT_COMPLETED is a common entry-point, commonly used by malicious code as a way to automatically start after the phone has booted up. Many more exist though (e.g., battery plugged, message received, phone lifted, etc.)

Caveats:

  • API 21+ (Lollipop) - need a wake lock
  • API 26+ (Oreo) - need a JobIntentService

Content Providers#

Content providers are activated by ContentResolver.

Intents#

Intents are the primary method for inter-process and inter-app communication. Other IPC means exist, e.g. sockets, files, etc.

Intents are used to activate components, e.g. start an activity.

Reference


Dalvik and Disassembly#

Plugins generate IUnit objects. Units can generate documents abiding to a standard interface, making them easy to render by clients implementing the JEB API. The primary document generated by an IDexUnit is a text document representing the disassembly of the input DEX files - or merged DEX files.

Auto dex merging#

Merging is done automatically, regardless of index limitations in place that may have required dex-splitting in the first place.

In rare cases, you may want to disable merging. Upon loading an APK, several APK plugin options will be presented to you. Untick Merge multi-dex to disable auto-merging:

DEX parsing options#

DEX plugin options are accessible in the Engines options UI panel. Filter on "dex." to list them.

The user will also be presented a Processing Properties dialog box when opening a new file, giving the user a chance to adjust the default options stored in your jeb-engines.cfg files.

Addressing#

All addresses to types, methods and fields uses the canonical JVM notation, e.g.:

  • type Blah in package com.abc: Lcom/abc/Blah;
  • method foo(int):void in the type com.abc.Blah: Lcom/abc/Blah;->foo(I)V
  • field name:String in the type com.abc.Blah: Lcom/abc/Blah;->name:Ljava/lang/String

JEB extends the method notation to reference code locations via a suffix +OFFSET.

  • Reference the instruction at offset 0x20 in the internal method foo(): Lcom/abc/Blah;->foo(I)V+20h

Types Naming Conventions#

Historically, addressing in Java can be quite confusing. There exists three types of notations:

  • The JVM notation (canonical representation) is the one used by DEX's TypeDescriptors, e.g. Ljava/lang/Object;. JEB's DEX plugins use and provide JVM canonical names.
  • Two JLS notations:
    • binary form, uses dots to separate package names, e.g.: java.lang.Object
    • internal binary (or just, 'internal') uses slashes, e.g.: java/lang/Object
  • The java.lang.Class API uses inconsistent representations, mostly the binary form.

Android plugins use the JVM notation internally and externally (public API).

Example: to reference a method void foo(String) in package a.b.c, its JVM address should be provided: La/b/c;->foo(Ljava/lang/String;)V. API methods using DEX addresses expect canonical addresses. The graphical client is less strict: although it expects canonical addresses by default as well, fall-back mechanisms are implemented to determine which address the user may have intended to use, and attempt to see if it's a match.

See our Reference Document.

Disassembly#

The default output uses modified smali (as explained in the previous section).

TODO

This section is a work-in-progress.


Renaming and Refactoring#

The DEX plugin provides smart renaming capabilities: renaming of items is done consistently throughout the code base.

Most internal items can be renamed (internal items are those defined in the DEX, as opposed to external items, simply referenced in the DEX, but whose definition is located elsewhere). What can be renamed:

  • class names (as well as interfaces and enums defined in the DEX)
  • method names
  • field names
  • labels (addresses)
  • package names
  • variables (in decompiled units)

In the UI client, Renaming can be done via the Action menu.

Renaming Methods#

Proper refactoring is especially important when renaming non-static methods (virtual methods, interface methods) which may be part of a hierarchy of other methods, overridden parents or overriding children.

Example: renaming B:a() should also rename A.a() and C.a()

A.a():Object
|
B.a():String
|
C.a():String

Renaming must also take into account indirect references to methods, something common when invoking virtual methods located in other classes, but which can be used to obfuscate a program further.

A.a()
|
B.a() - within B: invoke-virtual A.a() -> actually refers to B.a()

Things can get relatively complicated when hierarchies involve multiple inheritance through one or more interfaces.

Renaming Fields#

To some degree, the same applies to fields. Unlike methods, fields cannot be overridden; they are always accessed in a direct way, i.e. resolution is entirely determined at compile-time. However, field masking adds complications.

A.i:int public
|
B.i:int public - this field masks A.i

Shall a rename action of A.i also rename B.i? Conversely, shall renaming B.i also rename A.i? Theoretically, no. Those two fields are not related. However, at the moment, the DEX plugin renames both fields to maintain some degree of visual consistency with the original binary, regardless of whether field masking was done voluntarily or not, with obfuscating intent or not.

This is likely to change with the addition of an option to let users decide how they want to perform renaming.

Reorganizing code#

Packages can also be renamed. Users can also create additional packages and move packages and classes to other packages.

This feature can come in handy when dealing with obfuscated code for which entire type hierarchies were flattened out into a handful (sometimes single) set of packages. Most obfuscators, including the ubiquitous Proguard and its now equivalent r8, can do that.

If you want to explore refactoring further, in particular hierarchy reconstruction, have a look at our sample script DexCluster.py.

Auto-rename#

Auto-renaming is an optional standard Action also implemented by the DEX plugin. As its name implies, it can perform automatic blanket renaming of all items of a unit. In the case of DEX units, they include types, methods, fields, etc.

The action is semi-customizable by each plugin, as can be seen in the API. In the case of the DEX plugin, three policies are implemented. Each policy defines a set of legal characters for items. The loosest policy permits all characters in the printable ascii range (something already relatively strict considering the actual legal character ranges for Java identifiers); the strictest considers all current names (except those that were already defined) invalid, and therefore, will rename everything.

The obfuscation applied below is using right-to-left unicode characters to complicate rendering:

After auto-renaming (standard policy):


Cross-references in Dalvik#

Refer to the generic xrefs section of the manual.

TODO

The documentation of DEX specificities regarding xrefs are a WIP.


Decompilation to Java#

Note

Make sure to read the generic Decompilation section of the manual.

Decompiling classes#

By default, the Decompile action in the UI menu triggers the decompilation of an entire class and its constituents (fields, methods, member classes, etc.).

  • Fresh decompilation:
    • With the current options set up in your Engines context: Use Action, Decompile or press the Tab key.
    • With custom options: use Action, Decompile with Options or press the CMD1+Tab.
  • Re-decompile (e.g., after changing options)
    • Execute a "Decompile with Options" action as described above. The current decompilation of the class, if it existed, will be discarded, and a new decompilation will take place.

Generic decryption and deobfuscation#

The decompiler attempts to automatically performs data decryption and code unreflection. This process is done automatically by several method optimizers managed by the decompiler, with the help of code emulation in a built-in sandbox.

Sample malware code decompilation: light-blue methods have been unreflected; Purple strings are the result of generic decryptions

While the emulator and sandbox are currently not available in the API, a few key parameters of the emulator can be customized via the coreplugins/dexdec-emu.cfg file.

  • Copy the file dexdec-emu.cfg.TEMPLATE to dexdec-emu.cfg
  • Edit dexdec-emu.cfg
  • The changes will take effect at the next decompilation

Currently, the configuration file allows users to specify:

  • maximum emulation times
  • emulation policy for external methods (by groups, restricted lists, whitelists, and blacklists - e.g., a user can forbid the emulation of any time/date-related method)

Note

As of April 2020, data decryption combined with unreflection yields very effective results against most classes of Dalvik obfuscators. You will find examples of this on our blog.

Known support#

As of April 2020, the following well-known obfuscators were tested (*) and their output verified to be deobfuscated or partially deobfuscated:

  • appguard
  • arxan
  • dash-o
  • dexguard
  • dexprotector
  • several vendor-specific custom protectors
  • several custom protectors used by malicious applications, e.g. Joker
  • many java classfile (as opposed to targeting dex specifically) protectors, e.g. allatori

(*) The "test process" is not well-defined or exhaustive. It is partly automated as well as user-guided (therefore, feel free to reach out to us if you're seeing the generic decryptor chocking up on some obfuscation that you think should or could be supported.)

Exceptional control flow#

The decompilation of code protected in try blocks (try/catch+/finally?) is enabled by default.

However, reconstruction of try-with-resources (also known as ARM, for Automatic Resource Management) is more limited. This very-high level Java construct translates into complicated, lengthy, compiler-generated optimized code.

Note

Better support for try-with-resources reconstruction is a planned addition.

Recovering enums#

Enumerations in Java are a high-level construct that translate into multiple classes and synthetic methods. JEB attempts to discover and re-sugar those enumeration artifacts into the original enum. On failure, regular classes extending java.lang.Enum will be generated.

Note

Enum reconstruction can be disabled in the Options.

Enums are great candidates for obfuscation, and most Android protectors do obfuscate them. That process destroys important synthetic fields and structures that would allow simple recovery heuristics to work. However, support should function reasonably well, even on enumeration data that was intentionally shuffled to generate decompilation errors.

Note that enumerated fields can be renamed. Renaming is done consistently over the code base, including over reconstructed switches making use of such enums.

Decompiled enums in android.arch.lifecycle. Renaming and cross-referencing enumerated constants is supported.

Custom enumerated constants should also be properly reconstructed, including:

  • Field annotations
  • Custom initializers (see below)
  • Additional methods and method overrides

In this complex enumeration, the red block shows a custom initializer. Other interesting bits are the use of overrides and custom methods, annotations, as well as default and non-default constructors.

Recovering switches#

The detection and reconstruction on switch-on-enum and switch-on-string is supported.

Reconstruction of switch-on-string can be very complicated depending on how the compiler has generated and optimized the code, and therefore, is limited to simple cases.

This successfully reconstructed switch-on-string is implemented as a double-switch idiom by dx (a sparse switch on hashCode/equals to generate custom indices i, followed by a packed-switch on i). Not all switches are implemented like this. Regular if-conditional trees may be strategically generated by optimizing compilers.

Note

Better support for switch-on-string reconstruction is a planned addition.

Member classes and arguments capturing#

Properly rendering non-trivial member classes (particularly non-static named classes or anonymous classes) is made difficult by the fact that some of their arguments are captured from the outer class(es). Properly rendering anonymous constructors, with exact argument types and position, is also challenging.

In the example below, an anonymous class initializer is used to hide string decryption code:

  • The anonymous class extends Android’s OnActivityResultListener, instantiates the object, and tosses it immediately.
  • Decryption code takes place in the initializer. Note the captured arguments from the outer container method __m: i, _b. Access to other private class fields is made via synthetic accessor calls that were re-sugared into seemingly direct field access (BA._b).

Pseudo-moot anonymous class with an instance initializer attempting to conceal string decryption code.

Lambdas generation#

By default, JEB will try to recover and reconstruct lambdas.

Desugared Lambdas#

Recovery and reconstruction does not rely on any type of metadata 1, such as special prefixes -$$Lambda$ for classes and methods implementing desugared lambdas in dex 37-.

You may therefore see constructs like this:

This DEX file contains desugared, non-obfuscated lambdas.

This DEX file contained desugared, obfuscated lambdas

API

In the above cases, the underlying Java AST may be a IJavaNew or IJavaStaticField node. This is not the case for real (not desugared) lambdas. They will map to an IJavaCall node.

Lambda reconstruction can be disabled in the options. Lambda rendering can also be disabled in the options, as well as on-demand by right-clicking a decompiled view, Rendering Options.

Lambdas options

Real Lambdas#

Lambda reconstruction also takes place when the code has not been desugared (which is rare!), i.e. code relying on dex38’s invoke-custom and invoke-polymorphic.

This DEX file contains real lambdas implemented via invoke-custom

API

Such lambdas map to an IJavaCall node for which isLambdaCall() will return true.

Dynamic invocation opcodes#

The translation of invoke-custom whose bootstrap method is LambdaMetafactory.metafactory(...) allows the decompiler to generate proper Java code with lambda constructs.

However, this is just one (albeit one of the most important) cases of dynamic dispatch. invoke-custom and related opcodes (const-method-handle, const-method-type) cannot be as "easily" translated into intermediate representations - and later on, AST. For that reason, those opcodes are translated to regular invocation to artificial methods.

Artificial classes in jeb.synthetic#

Classes in the jeb.synthetic package are generated automatically by the DEX decompiler:

  • InvokeCustoms contains static methods representing dynamic dispatch to a method handle's callsite done via an invoke-custom opcode: jeb.synthetic.InvokeCustoms.CallSite<INDEX>_<DynamicName>(DynamicPrototype)

  • PooledMethodHandles contains static getters of method handles stored in a DEX pool and retrieved via a const-method-handle opcode: jeb.synthetic.PooledMethodHandles.Entry<INDEX>_<MethodName|FieldName>() : java.lang.invoke.MethodHandle

  • PooledMethodTypes contains static getters of method types stored in a DEX pool and retrieved via a const-method-type opcode: jeb.synthetic.PooledMethodTypes.Entry<INDEX>() : java.lang.invoke.MethodType

Decompiling Java Bytecode#

JEB supports JLS bytecode decompilation for *.class files and *.jar-like archives (jar, war, ear, etc.). The Java bytecode is converted to Dalvik using Android's dx by default. It falls-back to using d8 if a problem occurred. Users may choose to use d8 first instead by selecting so in the Options.

The resulting DEX file(s) are processed as usual.

You may use this to decompile Android Library files (*.aar files) in JEB.

Examining the android-arch-core-runtime library


Debugging Apps#

Refer to the next section, Android Debugging.


Miscellaneous#

Bulk decompilation and Export#

Bulk decompilation and export to *.java files on disk can be done in the UI client via the menu command File, Export, DEX Fast Decompilation.

You may also provide a regular expression filter if you'd like to restrict decompilation to a package and its sub-packages, e.g. com\.xyz\..* will decompile all classes in com.xyz and its sub-packages.

Refer to this note for generic details on exporting output.

Callgraph fragment#

The callgraph fragment is not specific to Android (most code analysis can ask to generate one). It is an experimental feature designed to represent trimmed callgraphs of the most important routines and most important invocations between those routines.

The fragment is located in the lower right-hand corner of a standard workspace. The callgraph is not generated by default:

Empty callgraph

Click on the fragment to generate it. The navigation shortcuts are similar to those used in the CFG fragment ('\' to center, '[' and ']' or zoom in and out, etc.). Click a node to set the focus on the associated routine and their connections. Double-click a node to jump to that routine in the disassembly listing.

A generated callgraph

Displaying synthetic items#

How to always display synthetic fields and methods in decompiled views? In the vast majority of the cases, synthetic accessors used by inner classes need not be displayed as they are re-optimized into direct, seamless outer class field access or method invocation.

However, if you wish to display them: In a decompiled view, right-click, "Rendering Options" and tick the boxes "Generate synthetic fields" and "Generate synthetic methods". You may also change this setting once and for all in the Engines option (Edit, Options, Engines).

Example: forcing rendering of Synthetic Fields

Third-party Frameworks#

When analyzing applications using resources located in other frameworks that the Android Framework (e.g. the Samsung framework), follow those steps:

  • Retrieve the custom framework archive using apt pull. It is normally stored somewhere in the device's /system/framework/ folder. Let's call it framework.zip.
  • Run aapt2 dump framework.zip and retrieve the first line, which will be something like Package name=xxxxxxx id=N. Note the id, N.
  • Navigate to the folder listed in your .parsers.apk.FrameworksDirectory engines property. Typically, it will be the HOME_FOLDER/.jeb-android-frameworks folder
  • Copy framework.zip into this folder, and rename it to N.zip
  • JEB should now be able to pick up that framework and use its resources when needed

Note

1.zip in the FrameworksDirectory folder is the Android framework itself, which has id 1.