The latest JEB release ships with our all-new Android resources (ARSC) decoder, designed to reliably handle tweaked, obfuscated, and sometimes malformed resource files.
As it appears that optimizing resources for space (eg, the WeChat team has made their compressor/refactoring module publicly available, etc.) or complexity (eg, commercial app protectors have been doing it for some time now) is becoming more and more commonplace, we hope that our users will come to appreciate this new module.
Here are the key points, followed by examples of what to expect from the new module.
ARSC Decoder Workflow
In terms of workflow, nothing changes: starting with JEB 2.3.10, the new Android Resources decoder module is enable by default.
If you ever need to switch back to the legacy module, simply open the Options, Advanced panel, filter on AndroidResourcesDecoderSelector and set the value to 1 (instead of 2).
ARSC Decoder Output
In terms of output, users should see improvements in at least three areas:
First, the module can deal with obfuscated resources and malformed files better, resulting in lower failure rates. Ideally, we’d like to get as close as possible to a 0-failure, so please report issues!
Second, flattened, renamed, or generally refactored resources are handled as well, and the original res/ folder will be reconstructed, resulting in a readable Resources sub-tree.
Finally, the module can generate an aapt2-like text output to cope with the limitations of AOSP’s aapt/aapt2 (eg, crashes); the output can be quite large, so currently, aapt2-like output generation is disabled by default. To enable it, go to the Options, Advanced panel, filter on AndroidResourcesGenerateAapt2LikeOutput and set the value to true. The output will be visible as an additional fragment of the APK unit view:
Additional Input (APK Frameworks)
By default, the latest Android framework (currently API 27) is dropped by JEB in [HOME_FOLDER]/.jeb-android-frameworks/1.apk.
If an app you are analyzing requires additional framework libraries, drop them as [package_id].apk in that folder, and you should be good to go.
Example 1: flattened resources in a banking app
Here’s a sample that demonstrates what the output looks like with an app found on VirusTotal. The app is called itsme and is produced by the ING bank. It may have been obfuscated by GuardSquare’s DexGuard 1, which performs extensive resources refactoring (res/ folder flattening) and trimming (renaming of files, name-less resource objects, etc.).
Have a look at the APK contents:
aapt2 fails on it (resource id overlap):
error: trying to add resource 'be.bmid.itsme:attr/' with ID 0x7f010001 but resource already has ID 0x7f010000.
apktool 2.3.1 cannot reconstruct the resource tree either. Resources are moved to an unknown/ folder; on non-Linux system, resources manipulation also fail due to illegal character names.
JEB does its best to rebuild the resources tree, and renames illegally named resources as well across the Resources base, consistently:
Example 2: tweaked xml
The second file is a version of the Xapo Bitcoin wallet app 2, also found on VirusTotal. This app does not fail aapt2, however, it does fail other tools, including apktool 2.3.1
I: Using Apktool 2.3.1 on 96cbabe2fb11c78a283348b2f759dc742f18368e0d65c5d0a15aefb4e0bdc645
I: Loading resource table...
I: Decoding AndroidManifest.xml with resources...
I: Loading resource table from file: [...]/1.apk
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 8601
The resources are flattened and renamed; the XML resources are oddly structured and stretch the XML specifications as well.
JEB handles things smoothly.
There are many more examples of “stretched” resources in APKs we’ve come across, however we cannot share them at the moment.
If you come across unsupported scenarios or bugs, feel free to issue a report, we’ll happily investigate and update the module.
Last week was the release of JEB 2.3.7 with a brand new parser for Mach-O, the executable file format of Apple’s macOS and iOS operating systems. This file format, like its cousins PE and ELF, contains a lot of technical peculiarities and implementing a reliable parser is not a trivial task.
During the journey leading to this first Mach-O release, we encountered some interesting executables. This short blog post is about one of them, which uses some Mach-O features to make reverse-engineering harder.
Interestingly, there are some sections related to the Objective-C language (“__objc_…”). Roughly summarized, Objective-C was the main programming language for OS X and iOS applications prior the introduction of Swift. It adds some object-oriented features around C, and it can be difficult to analyze at first, in particular because of its way to call methods by “sending messages”.
Nevertheless, the good news is that Objective-C binaries usually come with a lot of meta-data describing methods and classes, which are used by Objective-C runtime to implement the message passing. These metadata are stored in the “__objc_…” sections previously mentioned, and the JEB Mach-O parser process them to find and properly name Objective-C methods.
After the initial analysis, JEB leaves us at the entry point of the program (the yellow line below):
Wait a minute… there is no routine here and it is not even correct x86-64 machine code!
Most of the detected routines do not look good either; first, there are a few objective-C methods with random looking names like this one:
Again the code makes very little sense…
Then comes around 50 native routines, whose code can also clearly not be executed “as is”, for example:
Moreover, there are no cross-references on any of these routines! Why would JEB disassembler engine – which follows a recursive algorithm combined with heuristics – even think there are routines here?!
Time for a Deep Dive
Code Versus Data
First, let’s deal with the numerous unreferenced routines containing no correct machine code. After some digging, we found that they are declared in the LC_FUNCTION_STARTS Mach-O command – “command” being Mach-O word for an entry in the file header.
This command provides a table containing function entry-points in the executable. It allows for example debuggers to know function boundaries without symbols. At first, this may seem like a blessing for program analysis tools, because distinguishing code from data in a stripped executable is usually a hard problem, to say the least. And hence JEB, like other analysis tools, uses this command to enrich its analysis.
But this gift from Mach-O comes with a drawback: nothing prevents miscreants to declare function entry points where there are none, and analysis tools will end up analyzing random data as code.
In this binary, all routines declared in LC_FUNCTION_STARTS command are actually not executable. Knowing that, we can simply remove the command from the Mach-O header (i.e. nullified the entry), and ask JEB to re-analyze the file, to ease the reading of the disassembly. We end up with a much shorter routine list:
The remaining routines are mostly Objective-C methods declared in the metadata. Once again, nothing prevents developers to forge these metadata to declare method entry points in data. For now, let’s keep those methods here and focus on a more pressing question…
Where Is the Entry Point?
The entry point value used by JEB comes from the LC_UNIXTHREAD command contained in the Mach-O header, which specifies a CPU state to load at startup. How could this program be even executable if the declared entry point is not correct machine code (see Figure 2)?
Surely, there has to be another entry point, which is executed first. There is one indeed, and it has to do with the way the Objective-C runtime initializes the classes. An Objective-C class can implement a method named “+load” — the + means this is a class method, rather than an instance method –, which will be called during the executable initialization, that is before the program main() function will be executed.
If we look back at Figure 5, we see that among the random looking method names there is one class with this famous +load method, and here is the beginning of its code:
Finally, some decent looking machine code! We just found the real entry point of the binary, and now the adventure can really begin…
That’s it for today, stay tuned for more technical sweetness on JEB blog!
AppSolid is a cloud-based service designed to protect Android apps against reverse-engineering. According to the SEworks’ website, AppSolid is both a vulnerability scanner as well as a protector and metrics tracker. 1 This post focuses on the protector part only.
Note: AppSolid offers various protection options. However, based on our limited testing, it turns out that protected apps can be easily un-protected. The founder of SEworks said on the ProductHunt’s description page: “We looked at Proguard and Dexguard for our own app, and source code obfuscation just wasn’t enough (plus that makes crash logs hard to read), and solutions like Arxan cost a lot of money and required weeks of our time to integrate into the app. ” While the above is essentially true, the implication is that AppSolid fares better than “source code obfuscation”. He adds: “Why not create a shield that goes around the app to protect it, and is easy to use?” – that is the core issue that AppSolid is facing. A commenter added that “(AppSolid) does a good job of protecting against source code analysis through app decompiling”. That claim is false, but remained undisputed on the ProductHunt page.
This blog shows how to retrieve the original bytecode of a protected application. Grab the latest version of JEB (2.2.5, released today) if you’d like to try this yourself.
Once protected, the Android app is wrapped around a custom DEX and set of SO native files. The manifest is transformed as follows:
The package name remains unchanged
The application entry is augmented with a name attribute; the name attribute references an android.app.Application class that is called when the app is initialized (that is, before activities’ onCreate)
The activity list also remain the same, with the exception of the MAIN category-filtered activity (the one triggered when a user opens the app from a launcher)
A couple of Appsolid activity are added, mainly the com.seworks.medusah.MainActivity, filtered as the MAIN one
Note that the app is not debuggable, but JEB handles that just fine on ARM architectures (both for the bytecode and the native code components). You will need a rooted phone though.
The app structure itself changes quite a bit. Most notably, the original DEX code is gone.
An native library was inserted and is responsible for retrieving and extracting the original DEX file. It also performs various anti-debugging tricks designed to thwart debuggers (JEB is equipped to deal with those)
A fake PNG image file contains an encrypted copy of the original DEX file; that file will be pulled out and swapped in the app process during the unwrapping process
Upon starting the protected app, a com.seworks.medusah.app object is instantiated. The first method executed is not onCreate(), but attachBaseContext(), which is overloaded by the wrapper. There, libmd is initialized and loadDexWithFixedkeyInThread() is called to process the encrypted resources. (Other methods and classes refer to more decryption routines, but they are unused remnants in our test app. 2)
The rest of the “app” object are simple proxy overrides for the Application object. The overrides will call into the original application’s Application object, if there was one to begin with (which was not the case for our test app.)
The remaining components of the AppSolid DEX file are:
Setters and getters via reflection to retrieve system data such as package information, as well as stitch back the original app after it’s been swapped in to memory by the native component.
The main activity com.seworks.medusah.MainActivity, used to start the original app main activity and collect errors reported by the native component.
The protected app shipped with 3 native libraries, compiled for ARM and ARM v7. (That means the app cannot run on systems not supporting these ABIs.) We will focus on the core decryption methods only.
As seen above, the decryption code is called via:
m = new MedusahDex().LoadDexWithFixedkeyInThread(
Briefly, this routine does the following:
Retrieve the “high_resolution.png” asset using the native Assets manager
Decrypt and generate paths relative to the application
Permission bits are modified in an attempt to prevent debuggers and other tools (such as run-as) to access the application folder in /data/data
Decrypt and decompress the original application’s DEX file resource
The encryption scheme is the well-known RC4 algorithm
The compression method is the lesser-known, but lightning fast LZ4
More about the decryption key below
The original DEX file is then dumped to disk, before the next stage takes place (dex2oat’ing, irrelevant in the context of this post)
The DEX file is eventually discarded from disk
Retrieving the decryption key statically appears to be quite difficult, as it is derived from the hash of various inputs, application-specific data bits, as well as a hard-coded string within libmd.so. It is unclear if this string is randomly inserted during the APK protection process, on the server side; verifying this would require multiple protected versions of the same app, which we do not have.
A dynamic approach is better suited. Using JEB, we can simply set a breakpoint right after the decryption routine, retrieve the original DEX file from disk, and terminate the app.
The native code is fairly standard. A couple of routines have been flattened (control-flow graph flattening) using llvm-obfuscator. Nothing notable, aside from their unconventional use of an asymmetric cipher to further obscure the creation of various small strings. See below for more details, or skip to the demo video directly.
Technical note: a simple example of white-box cryptography
The md library makes use of various encryption routines. A relatively interesting custom encryption routine uses RSA in an unconventional way. Given phi(n) [abbreviated phi] and the public exponent e, the method brute-forces the private exponent d, given that: d x e = 1 (mod phi)
phi is picked small (20) making the discovery of d easy (3).
The above is a simple example of white-box cryptography, in which the decryption keys are obscured and the algorithm customized and used unconventionally. At the end of the day, none of it matters though: if the original application’s code is not protected, it – or part of it – will exist in plain text at some point during the process lifetime.
The following short video shows how to use the Dalvik and ARM debuggers to retrieve the original DEX file.
This task can be easily automated by using the JEB debuggers API. Have a look at this other blog post to get started on using the API.
The Jar file aj_s.jar contains the original DEX file with a couple of additions, neatly stored in a separate package, meant to monitor the app while it is running – those have not been thoroughly investigated.
Overall, while the techniques used (anti-debugging tricks, white box cryptography) can delay reverse engineering, the original bytecode could be recovered. Due to the limited scope of this post, focusing on a single simple application, we cannot definitively assert that the protector part of AppSolid is broken. However, the nature of the protector itself points to its fundamental weakness: a wrapper, even a sophisticated one, remains a wrapper.
“SCAN. Scan your app for any security vulnerabilities. Get results in seconds. PROTECT. Binary protection applied in minutes. No additional coding necessary. TRACK. Monitor your app’s security status in real time. Stop hacking before it happens.” retrieved from https://appsolid.co/ on May 18 ↩
Appsolid bytecode and native components could use a serious clean-up though, debugging symbols and unused code were left out at the time of this analysis. ↩
“Encrypt assets. Hide important data as well. DexGuard can encrypt your asset files transparently, so hackers won’t just run away with them.”
Let’s have a look at an app containing a protected asset, 1.png. The hexdump below shows the encrypted 1.png. The PNG header is gone, as this is the encrypted version of the file.
The decryption routine is easy to spot: it appears the AssetManager.open() call was replaced by inline code that uses encrypted strings and reflection to instantiate a Cipher class and decrypt the asset file. We will use the script from the previous post to decode the strings and determine exactly what is happening. The screenshot below shows the decompiled routine in question, after it was marked up and commented using JEB (try-catch have been disabled for improved clarity, as they do not help understand how the routine works):
The references to the AssetManager class, open method, asset filename 1.png as well as cipher type AES-CFB have been encrypted using the string encryption feature. The decryption is achieved using standard Android classes, and the scheme used is AES with a random key and IV (outlined in red in.)
The final call to InputStream.available() is actually what the original method was doing. Therefore, the protection mechanism replaced AssetManager.open() by the following section of code:
We conclude that manually restoring protected assets is not difficult, although it cannot be easily automated. One way to improve that protection scheme would be to use custom decryption routines instead of SDK-provided classes, and/or generalize the use of reflection to reference the decryption code.
Dexguard is the commercial alter-ego of Proguard, for Android applications. While Proguard provides name obfuscation and unused code paths removal, Dexguard can also encrypt strings, classes, and assets, add tamper detection code, and use Java reflection facility to deeply obfuscate what a piece of code does.
Let’s have a quick look at one feature of Dexguard, the string encryption mechanism. The app in question is a Dexguard’ed version of Cyanide.
First, access to strings inside a protected class are replaced by calls to a static private decryption routine. That routine accesses a static private array containing all protected strings within the class. That array is created by the class initializer upon loading:
The decryption routine usually takes 3 arguments, although in some simpler cases, it may only take two arguments. The arguments may also be shuffled. (Note: from a Dalvik standpoint, that routine is oddly structured. Although JEB’s decompiled output is structurally fine, a for-loop construct cannot be created, mainly because of the irregular jump inside the first if-cond branch.)
The screenshot below represents that decryption routine. The name was obfuscated by Dexguard. Notice the constant numerals that have been circled in that code snippet: they are the decryption routine characteristics.
The following screenshot shows the same piece of code after marking it up. As you can see, the encryption algorithm is fairly simple: the current output character is derived from the previous character as well as the current byte of the encryptedStrings array. A constant (-8 in this case) is added to yield the final character. The initial length (length) as well as initial character (curChar) are also offset by hard-coded constants. These three constants (0x3E, 0x199, -8) characterize the decryption routine, and are unique for each routine. The curChar parameter acts as the decryption key.
You may customize this Python script to decrypt Dexguard strings:
# decryptor for Dexguard encrypted strings
strings = [0x4C, 0xE, 2, 9, -7, 0x10, -54, 0x3E, 0x17, -9, -44, 0x4C, 0xA, ...]
c0, c1, c2 = (0x199, 0x3E, -8)
def decrypt(length, curChar, pos):
length += c0
curChar += c1
r = ''
for i in range(length):
r += chr(curChar & 0xFF)
curEncodedChar = strings[pos]
pos += 1
curChar = curChar + curEncodedChar + c2
Combined with reflection (another feature of the protector), the protection can be pretty effective.