Generic Unpacking for APK

JEB 5.9 ships with a new component for Android app (APK) reverse engineering: the Generic Unpacker.

The generic unpacker will attempt to emulate the APK to collect Dex files that would be generated dynamically, at runtime (i.e. not classes[N].dex). Many APK protectors, legitimate or otherwise – used for malicious purposes -, employ such techniques to make the Dalvik bytecode more difficult to access and analyze.

How to use the APK unpacker

First, open the target APK in JEB. In some cases, the unpacker module will let you know that there is a high-probability that the APK was packed:

In many cases, that heuristic won’t be triggered and no specific hint issued. Either way, you may start the unpacker via the Android menu, Generic Unpacking…

Start the Generic Unpacker via the Android menu

An options dialog will be displayed. At the time of writing (JEB 5.9), the only available option is the maximum duration after which the unpacking process should be aborted (the default is set to 3 minutes, although in most cases, unpacking will stop well before that time-out).

Options dialog for the unpacker

Press “Start” and let the unpacker attempt to recover hidden dex files.

After it’s done, a frame dialog will list the unpacker results, consisting of dexdec MESSAGE notifications indicating which dex files were recovered, and where. The logger will display similar information.

For each recovered dex, a corresponding dex unit will be created under a sub-folder named “unpacked” (highlighted in green, located under the APK unit).

The unpacker has completed and is displaying its results (one dex file was recovered)

Analyzing the collected Dex

At this point, you may decide to analyze the recovered dex files(s) separately. In this case, simply open up the dex unit(s) under “unpacked”, and proceed as normal (another bytecode hierarchy, disassembly view, etc. will be opened).

Alternatively, you may want to integrate the recovered dex with the already existing bytecode. To do this, follow these steps:

  • Right-click on the recovered dex unit, select Extract to… and save the dex to a location of your choice
  • Navigate to the primary dex unit (generally named “Bytecode”), to which you want to integrate that saved dex to, and open it with a double-click
  • Go to the Android menu, select Add/Merge additional Dex files… and select the file previously saved
  • The collected dex will be integrated with the existing bytecode unit, and the bytecode hierarchy will reflect that update

Limitations

The unpacker will not be able to handle all cases. Feel free to report any problem or bug you are encountering, we will see if anything can be done to support most cases.

In upcoming updates, the unpacker will also provide a small API to allow users to write plugins in the form of emulator hooks to do whatever is needed to perform an unpacking task that the built-in unpacker would fail at.

Until next time! (The next blog shall be part 3 of “How to use JEB”, to analyze more complicated/obfuscated native code. Stay tuned.)

Nicolas

How To Use JEB – Auto-decrypt strings in protected binary code

This is the second entry in our series showing how to use JEB and its well-known and lesser-known features to reverse engineer malware more efficiently. Part 1 is here.

Today, we’re having a look at an interesting portion of a x86-64 Windows malware that carries encrypted strings. Those strings happen to be decrypted on the fly, the first time they’re required by some calling routine.

SHA256: 056cba26f07ab6eebca61a7921163229a3469da32c81be93c7ee35ddec6260f1. The file is not packed, it was compiled for Intel x86 64-bit processors, using an unknown version of Visual Studio. The file is dropped by another malware and its purpose is reconnaissance and information gathering. Let’s load it in JEB 5.8 and do a standard analysis (default settings).

Initial decompilations

For the sake of showing what mechanism is at play, we’re first looking at sub_1400011F0. Let’s decompile it by pressing the TAB key (menu: Action, Decompile…).

Raw decompilation of sub_1400011F0, before examining its callees.

Then, let’s decompile the callee sub_140001120.

JEB can now thoroughly look at the routine and refines the initial prototype that was applied earlier, when the caller sub_1400011F0 was decompiled. It is now set to: void(LPSTR).

The code itself is a wrapper around CreateProcess; it executes the command line provided as argument.

sub_140001120 executes a command-line with CreateProcess. Note the refined prototype, void(LPSTR).

Press escape to navigate back to the caller, or alternatively, examine the callers by pressing X (menu: Action, Cross-references…) and select sub_1400011F0. You will notice that JEB is now warning us that the decompilation is “stale”.

The initial decompilation of sub_1400011F0 is stale after the decompilation of sub_140001120 yielded a better prototype.

Second decompilation

The reason is that the prototype of sub_140001120 was refined by the second decompilation (to void(LSPTR)), and the method can be re-decompiled to a more accurate version.

Let’s redecompile it: press F5 (menu: Window, Refresh). You can see that second decompilation below. What happened to the calls to sub_140001040?

Second decompilation of sub_1400011F0, showing some decrypted strings instead of calls to sub_140001040.

String auto-decryption

Notice the following:

  • A “deobfuscation score” note was added as a method comment (refer to part 1 of the series)
  • The calls to sub_140001040 are gone, they have been replaced by dark-pink strings

JEB also notified us in the console:

Notifications about decrypted strings replace in decompiled code.

Dark-pink strings represent synthetic strings not present in the binary itself. Here, they are the result of JEB auto-decrypting buffers by emulating the calls to routine sub_140001040, which was identified as a string provider. Indeed, the decompilation of sub_140001120 helped, since the inferred parameter LPSTR was back-propagated to the callers, which in that case, was the return value of sub_140001040.

Auto-decryption can be very handy. In the case of this malware, we can immediately see what will be executed by CreateProcess: shells executing whoami and dir and redirecting outputs to files in the local folder. However, if necessary, this feature can be disabled via the “Decryptor Options” in the decompiler properties:

  • Menu: Options, Back-end properties… to globally disable this in the future, except for your current project
  • Menu: Options, Specific Project properties… for the current project only
  • Or you may simply redecompile the method with CTRL+TAB (menu: Action, Decompile with options…) and disable string decryptor for specific code
The string auto-decryptor may be enabled or disabled in the options

The decryptor routine

What is sub_140001040 anyway? Let’s navigate to the routine in the disassembly and decompile it.

A raw decompilation of the decryptor code, sub_140001040

After examination of the code, we can adjust things slightly:

  • The global gvar_140022090 is an array of PCHAR (double-click on the item; rename it with N; change the type to a PCHAR using Y; create an array from that using the * key).
  • The prototype is really PCHAR(int), we can adjust that with Y.
  • The first byte of an entry into encrypted_strings is the number of encrypted bytes remaining in the string; if 0, it is fully decrypted and subsequent calls will not attempt to decrypt bytes again.
  • The key variable is v3 is the key; let’s rename it with N. Note that the key at (i) is the sum of the previous two keys used by indices (i-1), (i-2); the initial tuple is (0, 1). This looks like a Fibonacci sequence.1
The decryptor (sub_140001040) after analysis.

Comparison with GHIDRA

For comparison sake, here are GHIDRA 11 decompilations.

The caller (sub_1400011F0) decompiled by GHIDRA 11.0.
The decryptor (sub_140001040) decompiled by GHIDRA 11.0.
The CreateProcess wrapper (sub_140001120) decompiled by GHIDRA 11.0. Notice that the low-level structure initialization code adds quite a bit of confusion.

Conclusion

JEB decompilers2 do their best to clean-up and restore code, and that includes decrypting strings when it is deemed reasonable and safe.

That concludes our second entry in this “How to use JEB” series. In the next episodes, we will look at other features and how to write interesting IR and AST plugins to help us further deobfuscate and beautify decompiled code.

As always, thank you for your support, and happy new year 2024 to All 😊 – Nicolas

  1. Interestingly, the JEB assistant (call it with the BACKTICK key, or menu: Action, Request Assistant…) would like to rename this method to “fibonacci_sequence“! Not quite it, but that’s a relevant hint!)
  2. Note the plural: dexdec – the Dex decompiler – has had string auto-decryption via emulation for a while; its users are well-accustomed to seeing dark-pink strings in deobfuscated code!

How To Use JEB – Analyze an obfuscated win32 crypto clipper

We’re kicking off a malware analysis series explaining how to use JEB Decompiler to perform reverse engineering tasks ranging from out-of-the-box actions to complex use cases requiring scripts or custom plugins.

In this first entry, we look at a Windows malware compiled for x86 32-bit targets. The malware is an Ethereum cryptocurrency stealer. It monitors and intercepts clipboard activity to find and replace wallet addresses by an address of its own — presumably, one controlled by the malware authors to collect stolen ether.

Quick look at the malware

The file has a size of 81Kb, is compiled for x86 platforms. Although it does not appear to be packed, most metadata elements of the PE header were scraped. There is no rich data or timestamp.

SHA256: 503b2dc50262be583633db7b52dca9bcadc698413270047c209818436196c987

Quick look at the file in Hiew

If you are familiar with JEB, its terminology, and the organization of its UI elements, you may skip the next section and go directly to “Examining the code”.

Opening the file in JEB

Let’s fire up JEB. Any recent build (5.7+) with the x86 analysis modules and decompiler will do, i.e. JEB Community Edition or JEB Pro.

We open the file and keep the default settings
A view of the GUI after the initial analysis (from top-left, clockwise: project explorer, main workspace, and code hierarchy)

Project and units

The top-left view shows the project, along with a single artifact (the input file) and the analysis units created by JEB:

  • The artifact file has a blue-round icon
  • The top-level unit is a winpe unit
  • It has one child unit at the moment, named “x86 image”, of type x86.

The bottom-left view shows a list of code routines resulting from the analysis of the file.

Disassembly

By default, the main panel shows the disassembly window.

You may press the SPACE bar to switch to a graph view of the code (menu: Action, Graph…). In the graph view, only a single method is rendered at a time.

CFG (control flow graph) view of a disassembled routine

PE unit

If you wish to have a look at the PE file in more details, open the winpe unit. Double-click the corresponding node in the project hierarchy.

View of a winpe unit’s “Overview” fragment

The winpe unit view provides several information, organized in fragments that can be seen below the unit view: Description, Hex Dump, Overview (the default fragment), Sections, Directory Entries, Symbols, etc.

Note that if the PE had not been stripped, we would probably see a compilation timestamp as well as additional sub-units detailing the Rich Header data. For Windows executables, that data is important to perform fine-grained compiler identification.

The Symbols tab lists all symbols advertised by the PE, including imported and exported routines. For example, if you filter on “clip”, you can see multiple win32 routines relating to clipboard access, such as OpenClipboard or SetClipboardData:

The Symbols fragment of the winpe unit view, with a filter applied (“clip”)

Examining the code

Let’s go back to the disassembly offered by the x86 unit. First, notice that the code hierarchy view does not seem to contain well-known methods (static code), typically standard library routines linked at compile-time.

Let’s see why by looking at which siglibs (signature libraries) were applied during the initial analysis (menu: Native, Signature Libraries…). It looks like none were loaded:

The Signatures Libraries dialog

Library code identification

Normally, when JEB performs the initial auto-analysis of the code, compiler identification is used to determine whether well-known signature libraries of static code (siglibs) should be loaded and applied to the binary. In this case, compiler identification failed because all header data had been discarded. JEB decided to not load and apply signatures.

To apply them manually, tick the “MSVC x86” boxes. (An alternative is to let JEB know that the file was compiled with MSVC before the analysis starts: when opening the artifact, when the Options panel is displayed, the user may decide to force the compiler to a set-value.)

Forcing a compiler setting before the initial analysis

After doing either of the above ((a) file re-analysis with a compiler identification pre-set; or (b) manual siglibs application), several methods are identified as MSVC code:

Light-blue areas mean the code was matched against well-known signatures

Entry-point and WinMain

Navigate to the executable entry-point (menu: Native, Go to entry-point…).

In the general case, the entry-point of a Windows PE compiled with MSVC is not the high-level entry-point that will contain meaningful code. Although it is relatively easy to find WinMain with a bit of experience, there is a JEB script to help you as well, FindMain.py (available in the samples-script folder, also available on GitHub). Open up the script selector with F2 (menu: File, Scripts, Script selector…).

Run a JEB Python script inside the GUI client

Select the desired script and execute it. The result is displayed in the console:

...
Found high-level entry-point at 0x401175 (branched from 0x401D38)
Renaming entry-point to 'winmain'
...

The code at 0x401175 was auto-renamed to winmain (menu: Action, Rename…).

Initial decompilation

Let’s decompile that method by pressing the TAB key (menu: Action, Decompile…).

Initial decompilation of WinMain

Two items of interest to note at this point:

  • There is lots of code that appears to be junk or garbage
  • There is a note about some “deobfuscation score”

Junk code

The decompiled WinMain method is about 300 lines of C code. A lot of it are assignments writing to program globals. At first glance, it looks like it could be some sort of obfuscation. Let’s look at the corresponding assembly code:

Press TAB to go back from a decompilation to the closest matching machine code disassembly line

The snippets have the following structure:
push GARBAGE / pop dword [gXXX]

Or that, assuming edi is callee-saved:
mov edi, gXXX / ... / mov dword [edi+offset], GARBABE

Later on, we will see how to remove this clutter to make the analysis more pleasant.

Deobfuscation score

A note “deobfuscation score: 6” was inserted as a method comment. That score indicates that some “advanced” clean-up was performed. In this case, a careful examination (as well as a comparison against a decompilation with UNSAFE optimizers turned off, which you can do by redecompiling the method with CTRL+TAB (menu: Action, Decompile with Options…)) will point to this area of code:

The opaque predicate calculation is highlighted in green using CTRL+M (menu: Action, Toggle Highlight…)

This predicate looks like the following: if(X*(X+1) % 2 == 0) goto LABEL.

With X being an integer, X*(X+1) is always even. Therefore, the predicate will always evaluate to true. JEB cleaned this up automatically. (While this particular predicate is trivial, truly opaque predicates will also be attempted to be broken up by JEB, using the Z3 SMT solver.)

Comparison with GHIDRA

For a point of comparison, you may have a look at the same method decompiled by GHIDRA 10.4 here (default settings were used, just like we did with JEB). The predicate is not cleaned-up adequately, extra control-flow edges are left over, leading to AST structuring confusion.

Cleaning up the code

Let’s start with decluttering this code. First of all, why couldn’t the decompiler clean it up on its own? If the globals written to are never read with meaningful intent, then they could be discarded.

The issue is that this is very hard to ensure in the general case. However, in specific cases, sometimes involving manual review, some global written-to memory range may be deemed useless, as it is the case here. How do we provide this information to the decompiler? Well, as of version 5.7, we cannot! 1 What we can do though is write a decompiler plugin to clean-up the offending IR, and in the process, generate clean(er) code.

IR cleaner plugin

The decompiler accept several types of plugins, including IR Optimizers (they work on the Intermediate Representation of a routine, as it moves up the decompilation pipeline), and AST optimizers (to clean-up or reformat the generated abstract syntax tree of the pseudo-code). In most cases, IR optimizers are well-suited to perform code clean-up or deobfuscation tasks (refer to this blog post for a detailed comparison).

We will write the plugin in Java (we could also write it in Python). It will do the following:

  • Examine each IR statement of a CFG
  • Check if the statement is writing an immediate to some global array: *(array + offset) = value
  • If so, check the array name. If it starts with the prefix “garbage”, consider the statement useless and replace it by a Nop statement

Writing IR plugins is out-of-scope in this post; we will go over that in details in a future entry. In the meantime, you can download the plugin code here. Dump the Java file in your JEB’s coreplugins/scripts/ folder. There is no need to close and re-open JEB; it will be picked up at the next decompilation.

public class GarbageCleaner extends AbstractEOptimizer {

	@Override
	public int perform() {
		int cnt = 0;

		for (BasicBlock<IEStatement> b : cfg) {
			for (int i = 0; i < b.size(); i++) {
				IEStatement stm = b.get(i);
				if (stm instanceof IEAssign && stm.asAssign().getDstOperand() instanceof IEMem
						&& stm.asAssign().getSrcOperand() instanceof IEImm) {
					IEMem dst = stm.asAssign().getDstOperand().asMem();
					IEGeneric e = dst.getReference();
					// [xxx + offset] = immediate
					if (e.isOperation(OperationType.ADD)) {
						IEOperation op = e.asOperation();
						if (op.getOperand1().isVar() && op.getOperand2().isImm()) {
							IEVar v = op.getOperand1().asVar();
							IEImm off = op.getOperand2().asImm();
							if (v.isGlobalReference()) {
								long addr = v.getAddress();
								INativeContinuousItem item = ectx.getNativeContext().getNativeItemAt(addr);
								// logger.info("FOUND ITEM %s", item.getName());
								if (item != null && item.getName().startsWith("garbage")) {
									long itemsize = item.getMemorySize();
									if (off.canReadAsLong() && off.getValueAsLong() + dst.getBitsize() / 8 < itemsize) {
										logger.info("FOUND GARBAGE CODE");
										b.set(i, ectx.createNop(stm));
										cnt++;
									}
								}
							}
						}
					}
				}
			}
		}

		if (cnt > 0) {
			cfg.invalidateDataFlowAnalysis();
		}
		return cnt;
	}
}

Note that by design, the plugin is not specific to this malware. We will be able to re-use it in future analyses: all global arrays prefixed with “garbage” will be treated by the decompiler as junk recipients, and cleaned-up accordingly!

Defining the garbage array

At this point, we need to determine where that array is. Some examination of the code leads to the following boundaries (roughly): start at 0x41597E, spans over 0x100 bytes. Navigate to the disassembly; create an array using the STAR key (menu: Native, Create/Edit Array…); specify its characteristics.

Creating a global array of 0x100 bytes. This is the garbage array.

As soon as the array is created, the disassembly will change to what can be seen below. At the same time, the decompilations using that array will be invalidated; that is the case for WinMain. You may see that another extra-comment was added by the decompiler: “Stale decompilation – Refresh this view to re-decompile this code”. Such decompilations are read-only until a new one is generated.

The array is now created. The decompilation of WinMain becomes stale.

Before redecompiling, remember we need to rename our array with a label starting with “garbage”. Set the caret on the array, hit the key N (menu: Actions, Rename…) and set your new name, e.g., garbageArray1.

Now you may go back to the decompilation view of WinMain and hit F5 (menu: Windows, Refresh…) to regenerate a decompilation.

Decompiled WinMain after the garbage array-assigns were cleaned-up by the plugin

The code above is much nicer to look at – and much easier to work on!

Quick analysis

The method at 0x401000, called by WinMain, is decrypting the thief’s wallet address, and generating two hexstring versions of it (ascii and unicode).

Decrypting the target wallet address. The decompilation is shown after proper types were applied on the data structures accessed (encrypted wallet address, hexstrings, etc.) and better names given to those vars

The loop in WinMain is doing the following:

  • Every second, it queries the Windows clipboard with OpenClipboard
  • It checks if it contains text strings or unicode strings
  • If the string is 42 characters in length and starts with “0x”, it proceeds (an Ethereum wallet address is 20 bytes, therefore its hexadecimal representation would be 40 characters)
  • It checks if the string is not the attacker’s wallet address
  • If not, it replaces the contents of the clipboard data by the attacker’s wallet address using SetClipboardData
  • Finally, the other contents found in the clipboard is discarded

Well-known literals

In JEB, you may replace immediates by well-known literals found in type libraries (aka typelibs, such as the win32 typelibs, which were automatically loaded when the analysis of the PE file started). To do that, select the immediate, then hit CTRL+N (menu: Action, Replace…), and select the desired literal 2

For example, per the MSDN, GetClipboardData uses CF_xxx constants to indicate the type of data. We can ask JEB to replace GetClipboardData(13) by GetClipboardData(CF_UNICODETEXT) using the Action/Replace handler:

Replacing 13 by CF_UNICODE in a call to GetClipboardData

Conclusion

That concludes the first blog in this “How to use JEB” series. In the next episodes, we will look at other features, dig deeper into writing IR plugins, look into types and types creation, and reverse other architectures, including exotic code.

To learn more, we encourage you to:

  • Explore this blog, as it contains many technical entries and how-to’s.
  • Look at the sample code (scripts and plugins) shipping with JEB, it will get you started on using the API to write your own extensions.
  • Join our Slack channel to engage with other users in the community and ask questions if you’re stuck on anything.

Thank you very much & Stay tuned 🙂 Happy Holiday to All 🎄

  1. The plugin written to analyze this malware may ship in some upcoming version of JEB.
  2. In many cases, JEB will do that automatically, and it should be the case here.

JEB Assistant

Update: With JEB 5.6, several restrictions are lifted to make the Assistant available for Java decompiled output generated by dexdec (it is currently limited to C output generated by gendec).

Starting from JEB 5.2, you may use the experimental “JEB Assistant” to infer names for decompiled methods and method parameters.

Below is a decompiled aarch64 routine found in the BPFDoor malware. A raw decompilation does not produce any useful name (the default routine name is sub_40157C).

An unnamed arm64 decompiled routine

You may click the “Call the Assistant” button (also available via the Action menu, Request Assistant handler, or the back-tick keyboard shortcut) to query the assistant via JEB.IO. At the time of writing, a JEB.IO account is not required to access the assistant.

Upon first request, a disclaimer will be shown, letting you know that the decompiled code must be sent to our server:

The disclaimer is shown the first time the assistant is called

The assistant may return a better name for the method and its parameters. Sometimes, the names may be incorrect, yet provide some insight into what the method is doing. Other times, they may be entirely out of scope! It is always better to take the provided results as hints, rather than absolute truths.

In the case of our mysterious method, the assistant did provide valuable information: decryptData(data, size, key). Indeed, the method is a decryption function — more specifically, rc4 with a pre-computed sbox. The parameter names are (almost) correct.

You may decide to apply the suggested method name directly. The suggested parameter names are not applied automatically.

The assistant is providing the suggestions, it is up to the user to apply them

This feature is experimental. Currently, several limitations apply:

  • The assistant is limited to decompiled native routines. It will not work for dex/dalvik decompilations.
  • The assistant will refuse to work on overly long routines (whose decompilation exceeds several thousand characters).
  • The assistant is not available via the JEB API and requests are rate-limited (at most one every 5 seconds).

On the plus side, a JEB.IO account is not required at this time to use the assistant! Anybody can use it to (sometimes) gain insight into obscure decompilations. We hope it will help you in your reverse-engineering efforts. Please let us know your feedback through the usual channels (email, Slack, etc.).

Until next time 🙂 — Nicolas.

Control-flow unflattening in the wild

Both JEB decompiler engines 1 ship with code optimizers capable of rebuilding methods whose control-flow was transformed by flattening obfuscators.

Image © Tigress (University of Arizona)

Control-flow flattening, sometimes referred to as chenxification2, is an obfuscation technique employed to destructure a routine control-flow. While a compiled routine is typically composed of a number of basic blocks having low ingress and egress counts, a flattened routine may exhibit an outlier node having high input and high output edge counts, and generally, a very high centrality in the graph (in terms of vertex betweenness). Practically speaking, the original method M is reduced to a many-way conditional block H evaluating an expression VPC, dispatching the flow of execution to units of code, each one performing a part of M, updating VPC, and looping back to H. In effect, the original structured code is reduced to a large switch-like block, whose execution is guided by a synthetic variable VPC. Therefore, the original flow of control, critical to infer meaning while performing manual reverse-engineering, is lost. 3

We upgraded dexdec‘s control flow unflattener earlier this year. 4 The v2 of the unflattener is more generic than our original implementation. It is able to cover cases in which the obfuscated does not map to the clean model presented above, e.g. cases where the dispatcher stands out.

This week, we encountered an instance of code that was auto-deobfuscated to clean code and thought it’d be a good example to show how useful generic deobfuscation of such code can be. It seems that the obfuscator that was used to protect the original code was BlackObfuscator, a project used by clean apps and malware alike.

Hash: 92ae23580c83642ad0e50f19979b9d2122f28d8b3a9d4b17539ce125ae8d93eb

Before deobfuscation.

After deobfuscation, the code looks like:

After deobfuscation.

If you encounter examples where the unflattener does not perform adequately, please let us know. We’ll see if they can be fixed or upgraded to cover obfuscation corner-cases.

Thank you & until next time — Nicolas.

  1. dexdec is JEB’ dex/dalvik decompiler, gendec is JEB’s generic decompiler used for native code and any code other than dex/dalvik
  2. A term coined by University of Arizona’s Pr. Christian Collberg for the fact that an early description of this technique was presented by Dr. Chenxi Wang in her PhD thesis
  3. Control-flow flattening can be seen as a particular case of code virtualization, which was covered in previous blog entries.
  4. JEB 4.25 released on Jan 17 2023

Recovering JNI registered natives, recovering protected string constants

This is part 2 of the blog that introduced the major addition that shipped with JEB Pro 4.29: the ability for the dex decompiler to call into the native analysis pipeline, the generic decompiler and native code emulator.

Today, we demo how to use two plugins shipping with JEB 4.30, making use of the emulators to recover information protected by a native code library found in several APKs, libpairipcore.so.

Recovering statically registered native routines

The first plugin can be used to discover native routines registered via JNI’s RegisterNatives. As a reminder, when calling a native method from Java, the JNI will see if exported routines with specific names derived from the Java method signature exist in the process. Alternatively, bindings between a Java native method and its actual body can be done with RegisterNatives. Typically, this is achieved in JNI_OnLoad, the primary entry-point. However, it does not need to; other techniques exist to further obfuscate the target call site of a Java native method, such as unregistration/re-registration, the obfuscation of JNI_OnLoad, etc. More information can be found here.

In its current state, the plugin will attempt to emulate a SO library’s JNI_OnLoad on its own, without the context of the app process it would normally run on. The advantage is that the plugin is useable on libraries recovered without their container app (APK or else). The drawback is that it may fail in complex cases, since the full app context is not available to this plugin. (Note that the second plugin does not suffer this limitation).

Open an APK or Elf SO file(s), run the “Recover statically-registered natives (Android)” plugin.
Set optional name filters or architecture filters as needed.
The results will be visible in the log. In this case, it looks like the aarch64 library libpairipcore.so registered one method for com.pairip.VMRunner.executeVM, and mapped it to a routine at 0x5F180.

Recovering constants removed from the Dex

The second plugin makes use of an IEmulatedAndroid object to simulate an execution environment and execute code that may be restoring static string constants removed from the Dex by code protection systems.

We can imagine that the code protection pass works as such:

String constants are being removed during a protection pass.

The implementation details of restore() are not relevant to this blog entry. In the case of that particular app, it involves calling into a highly obfuscated native library called libpairipcore.so.

The plugin requires a full APK. It will emulate a static method selected by the user and let them know about the constants that were restored.

The plugin workflow is as follows:

After loading an APK, the plugin may let the user know that the code was protected.
Execute the “Recover removed Dex constants” plugin.
The user will be asked to input the no-arg static method that should be simulated. If a suitable one is found, it may be pre-populated by the plugin.
The execution can be lengthy, from several seconds to several minutes. Recovered strings are registered as fields comments as well as decompiler events in the relevant dexdec unit of your project.

Conclusion

That’s it for today. Make sure to update to JEB Pro 4.30 if you want to use those plugins.

I would encourage power-users to explore the JEB’s API, in particular IDState, EState/EEmulator and IEmulatedAndroid, if they want to experiment or work on code that requires specific hooks (dex hooks, jvm sandbox hooks, native emu hooks, native memory hooks – refer to the registerXxxHooks methods in IDState) for the emulators to operate properly.

Until next time — Nicolas.

Android JNI and Native Code Emulation

JEB 4.29 finally bridges the gap between the dex analysis modules in charge of code emulation (dexdec‘s IDState and co.) and their counterparts in the native code analysis pipeline (gendec‘s EEmulator, EState and co.).

The emulation of JNI routines from dexdec unlocks use-cases that are now becoming commonplace, such as:

  • Object consumption relying on native code calls to make reverse-engineering harder. The typical case is the retrieval of encrypted strings where part of the decryption code is bytecode, part is native code.
  • General app tweaking done on the native side, such as field setting, field reading, method invocation, object creation, etc.

Example

Here is an example of what could not be done by JEB <4.29:

//
// dex code:
//

package a.b;

class X {
  ...
  native String decrypt(char[] array, int key1, int key2);
  ...
  void f() {
    return decrypt(new char[]{'K', 'F', 'C'}, 4, 3);
  }
  ...
}

//
// native code:
//

// pseudo-code for method `dec` mapping to `a.b.X.decrypt`
jstring dec(JNIEnv* env, jobject this, jcharArray array, int a, int b) {
  int len = (*env)->GetArrayLength(env, array);
  uint16_t out[len];
  for(int i = 0; i < len; i++) {
    out[i] = array[i] - (a - b);
  }
  return (*env)->NewString(env, out, len);
}

JEB used to decompile X.f() to:

void f() {
  return decrypt(new char[]{'K', 'F', 'C'}, 4, 3);
}

JEB 4.29, if the native emulator is enabled, is able to return a simpler version:

void f() {
  return "JEB";
}

Preparation

Currently, the native emulator is disabled by default. In order to let dexdec use it, edit your dexdec-emu.cfg file (located in your coreplugins/ folder, or in the GUI, Android menu, handler Emulator Settings…):

  • Mandatory: set enable_native_code_emulator to true
  • Recommended: increase the values of emu_max_duration and emu_max_itercount (the reason being the the analysis of native images by the native code plugins can be quite time-consuming).

You will also need a JEB Pro license to use this feature.

Output

As usual, the auto-decryption of an item will also emit an event, which can be collected programmatically, and visible in the Decompiler’s “Events” fragment in the GUI.

Items whose address is formatted as @LIB:<lib.so>@NativeAddress are decrypted native items that were found in the SO image at some point.

Decrypted strings collected by the decompiler

Similarly, decrypted items found in decompiled code are rendered using a purple’ish pink (by default) in the GUI.

If native code was involved in the decryption, the on-hover pop-up will let you know:

Decryption of that string required emulation of native code

API

The native emulator(s) managed by a dexdec‘s IDState can be customized with the following newly-added methods and types:

  • enableNativeCodeEmulator / isNativeCodeEmulatorEnabled : enable or disable the native emulator (the master setting is pulled from your config file, dexdec-emu.cfg)
  • registerNativeEmulatorHooks / unregisterNativeEmulatorHooks : hooks into the evaluation (emulation) of the native code – refer to the appropriate hooks interfaces. The hooks receives a reference to the controlling EEmulator.
  • unregisterNativeEmulatorHooks / ununregisterNativeEmulatorHooks : hooks into the memory accesses of the emulator’s state – refer to the appropriate hooks interfaces. The hooks receives a reference to the target EState object.

Conclusion

Interfacing both emulators offers many possibilities to improve the reverse-engineering experience of complex binaries and applications.

There is more that can be done, which will be discussed further blog posts:

  • Retrieval of statically registered natives (through JNIEnv’s RegisterNatives) as opposed to native routines automatically resolved using the JNI naming conventions.
  • Automatic unpacking of native code.
  • Use of the native emulator in custom scripts and plugins.

Note that this feature is currently limited to JEB Pro.

The JNI native code emulator will work with x86, x64, and arm64 code (we may add support for arm in the near future). Needless to say, it is still in experimental mode! Therefore, you may encounter strange results or problems while analyzing code making use of it. Please send us error reports to support@pnfsoftware.com.

Until next time, and once again, thank you to our amazing users for their continued support and kind words 🙂 — Nicolas.

IR and AST Optimizers in Decompilers

The following is a small guide that will help users writing decompiler plugins decide whether they need to work at the IR (Intermediate Representation) level or at the AST (Abstract Syntax Tree) level. The recommendations apply to both JEB decompiler engines, dexdec (for Android Dex/Dalvik) and gendec (generic decompiler engine.

Decompilation Pipeline

A method undergoing decompilation goes through the following simplified pipeline:

  1. The low-level native code (machine code or bytecode) is converted to low-level IR
  2. Some augmentation take place, including SSA transformation and typing
  3. IR processors lift and clean the low-level IR
  4. The final high-level IR is converted to an AST
  5. AST processors clean and beautify the code
  6. The final AST is rendered as pseudo-code

The steps 3 (IR processing) and 5 (AST processing) are customizable by the user through JEB’s API. Indeed, custom plugins are sometimes necessary to perform work not done by JEB’s built-in optimizers.

IR vs AST

The following comparison between IR and AST will help you decide which plugin is better suited to perform some type of work.

  • The number of IR elements to deal with is substantially smaller than the AST counterpart. As such, it may be easier to learn at first. The AST being more abstract and closer to final pseudo code, there are necessarily more types of elements (e.g. a Break element, representing a break; statement, does not exist at the IR level). However, modifying IR statements requires more care than modifying the AST tree.
  • The IR of a method is a flat sequence of instructions, organized into basic blocks. The flow of execution between the blocks is clear and concise. On the other hand, the AST being a tree, its navigation is not as straight-forward as a flat IR listing. While the concept of blocks exists, they are not necessarily basic blocks, and the flow of execution in the AST is not trivial to determine.
  • A consequence of the above is that data analysis is easier done at the IR level than at the AST level. The IR framework provides Data Flow Analysis objects with easy-to-use ways to determine where and by what variables are being accessed. This is a fundamental prerequisite for many non-trivial optimizers whose goal is code cleaning or restructuring (e.g. constant and variable propagation, dead code elimination, etc.).
  • Continuing the above, the IR framework generally offers more facility and helpers to perform advanced optimization, such as deobfuscation. Examples: dexdec offers an emulator and sandbox engine at the IR level, something unavailable at the AST level; gendec offers pattern matching facility making the development of complex IR rewriting rules easy.
  • The AST is closer to the final generated pseudo-code. As such, it is a place of choice to perform final beautification or clean-up passes. High-level clean-up, requiring the insertion of AST elements with no IR equivalents, can only be done at the AST level.

Generally, working at the AST level will seem more approachable and an easiest entry-point to writing decompiler plugins. However, in most cases, IR processors will be better suited to perform non-trivial optimizations and deobfuscation.

Development

For dexdec, IR and AST plugins can be developed as compiled jar, or plugin scripts (Java or Python). Plugin scripts are extremely convenient for quick prototyping. See example code in your JEB coreplugins/scripts/ folder.

For gendec, IR and AST plugins can be developed as compiled jar only. Support for plugin scripts will come soon.

Resources

This blog contains several tutorials on how to get started with writing IR and AST plugins for both dexdec and gendec.

You will also find examples in this GitHub repository.

API Reference: dexdec IR, dexdec AST, gendec IR, gendec AST

Reversing dProtect

In this post, we’re having a look at the first release of dProtect (v 1.0) by Romain Thomas. dProtect is a fork of ProGuard that provides four additional self-explanatory configuration flags:

  • -obfuscate-strings
  • -obfuscate-constants
  • -obfuscate-arithmetic
  • -obfuscate-control-flow (via flattening & opaque predicates — unfortunately, I was unable to get this flag to work, so it’s something we’ll have to revisit in the future.)

Let’s see how JEB’s dexdec’s built-in optimizers as well as custom IR plugins can be used to defeat some implementations of strings obfuscation, constants obfuscation, and arithmetic operations obfuscation.

Strings Obfuscation

The test method is as follows:

// targeted by: -obfuscate-strings
public String provideString() {
    return "hello dProtect";
}

Let’s disable dexdec’s built-in deobfuscators (CTRL+TAB to decompile, untick “Enable deobfuscators”) to get a chance to look at the obfuscated code. It decompiles to:

public static String a(String arg14) {
    StringBuilder v0 = new StringBuilder();
    int v1 = ((int)DPTest1.b[4]) ^ 1684628051;
label_8:
    while(v1 < arg14.length()) {
        int v2 = arg14.charAt(v1);
        while(true) {
            int v9 = v2 ^ -1;
            v0.append(((char)((((int)DPTest1.b[10]) ^ 0x2AE022E9) + v2 + (((int)DPTest1.b[3]) ^ 0x35A299BD) + (((int)DPTest1.b[10]) ^ 0x2AE022E9 ^ -1 | v9) - ((((int)DPTest1.b[10]) ^ 0x2AE022E9) + v2 - ((((int)DPTest1.b[10]) ^ 0x2AE022E9) + v2 + (((int)DPTest1.b[3]) ^ 0x35A299BD) + (((int)DPTest1.b[10]) ^ 0x2AE022E9 ^ -1 | v9))))));
            long[] v3 = DPTest1.b;
            int v6 = v1 ^ -1;
            v1 = v1 + (((int)v3[3]) ^ 0x35A299BD) + (((int)v3[3]) ^ 0x35A299BD) + (((int)v3[3]) ^ 0x35A299BD ^ -1 | v6) + ((((int)v3[3]) ^ 0x35A299BD) + v1 - ((((int)v3[3]) ^ 0x35A299BD) + v1 + (((int)v3[3]) ^ 0x35A299BD) + (((int)v3[3]) ^ 0x35A299BD ^ -1 | v6)));
            if((DPTest1.a + (((int)v3[3]) ^ 0x35A299BD)) % (((int)v3[7]) ^ 0x2B0F969A) != 0) {
                continue label_8;
            }
        }
    }

    return v0.toString();
}

public String provideString() {
    return DPTest1.a("歬歡歨歨歫欤歠歔歶歫歰歡歧歰");
}

A decryptor method a(String):String was generated by dProtect. It performs various computations to decrypt the input string.

One built-in optimizer that ships with JEB’s dexdec uses the IDState object to perform emulation (explained in a previous blog). It cleans up such code automatically:

provideString() is auto-deobfuscated by JEB’s dexdec

Arithmetic Operations Obfuscation

The test method is as follows:

// targeted by: -obfuscate-arithmetic
public int calculate(int x) {
    return 100 + x;
}

With standard JEB settings (re-tick “Enable deobfuscators” if you had disabled it), the obfuscated code decompiles to:

static {
    long[] v0 = new long[12];
    DPTest1.b = v0;
    v0[0] = 0x371C2961L;
    v0[1] = 0x13DD5724L;
    v0[2] = 0x17EB3014L;
    v0[3] = 0x35A299BCL;
    v0[4] = 1684628051L;
    v0[5] = 1720310111L;
    v0[6] = 0x576F77CBL;
    v0[7] = 0x2B0F9698L;
    v0[8] = 360862103L;
    v0[9] = 0x5A9D6037L;
    v0[10] = 0x2AE049EDL;
    v0[11] = 2060383159L;
    DPTest1.a = ((int)v0[11]) ^ 1305664179;
}

public int calculate(int arg4) {
    return arg4 + (((int)DPTest1.b[0]) ^ 0x371C2905);
}

As can be seen, the constant 100 has been replaced by an arithmetic operation, here, a XOR operating on an immediate and a static array element set up in the class initializer.

JEB does not ship with overly complex deobfuscators operating on arrays, because it is near-impossible in the general case to assess their finality (i.e. answer the question “will values be changed during the program execution?” definitively). However, to solve particular cases of obfuscation, writing a custom IR plugin to tackle this obfuscation is an acceptable solution. (Have a look at this post to get started on dexdec IR plugins.)

Let’s check DOptUnsafeArrayAccessSubst.java, a sample IR plugin that ships with JEB (folder coreplugins/scripts/) and does does exactly what we need: detecting the use of static array elements and replacing them by their actual values. We can enable the plugin by removing the “.DISABLED” extension. Now redecompile (CTRL+TAB). And… well, nothing has changed! It is time to examine the plugin code carefully, maybe even use your favorite IDE to troubleshoot and augment it. Here is what prevented the original plugin from kicking in: the plugin was looking for IR elements such as: IDArrayElt ^ IDImm. However, the IR it got was: (<int>IDArrayElt) ^ IDImm, that is, the array element was cast to int, making the IR expression an IDOperation, not an IDArrayElt.

The DOptUnsafeArrayAccessSubstV2.java plugin takes care of that (refer to isLikeArrayElt method).

Now we can redecompile. and things were deobfuscated as expected:

calculate() is deobfuscated by DOptUnsafeArrayAccessSubstV2

Constants Scrambling

Finally, let’s have a look at how constants obfuscation is achieved. The documentation gives examples of cryptographic-like S-boxes being initialized. The test method is as follows:

// targeted by: -obfuscate-constants
public void initArray(int[] a) {
    a[0] = 0x61707865;
    a[1] = 0x3320646e;
    a[2] = 0x79622d32;
    a[3] = 0x6b206574;
}

Out of the box, JEB decompiles the obfuscated code to:

static {
    long[] v0 = new long[12];
    DPTest1.b = v0;
    v0[0] = 0x371C2961L;
    v0[1] = 0x13DD5724L;
    v0[2] = 0x17EB3014L;
    v0[3] = 0x35A299BCL;
    v0[4] = 1684628051L;
    v0[5] = 1720310111L;
    v0[6] = 0x576F77CBL;
    v0[7] = 0x2B0F9698L;
    v0[8] = 360862103L;
    v0[9] = 0x5A9D6037L;
    v0[10] = 0x2AE049EDL;
    v0[11] = 2060383159L;
    DPTest1.a = ((int)v0[11]) ^ 1305664179;
}

public void initArray(int[] arg5) {
    long[] v0 = DPTest1.b;
    arg5[1684628051 ^ ((int)v0[4])] = 133800250 ^ ((int)v0[5]);
    arg5[0x35A299BD ^ ((int)v0[3])] = 0x644F13A5 ^ ((int)v0[6]);
    arg5[0x2B0F969A ^ ((int)v0[7])] = 0x6CE07CA5 ^ ((int)v0[8]);
    arg5[0x13DD5727 ^ ((int)v0[1])] = ((int)v0[9]) ^ 0x31BD0543;
}

Note that the use of synthetic static arrays is made, as was the case for the arithmetic operations obfuscation pass. Therefore, let’s try the DOptUnsafeArrayAccessSubstV2 plugin. As careful examination of the above code may give in, the plugin fails to deobfuscate this code on the first go. The reason: if you examine the IR produced while debugging the plugin, you will notice that the static array elements are accessed via a variable (v0, above). In IR, those elements are IDVar. Therefore, we need to check whether this variable references a static array. We will do that by using the data flow analysis facility made available to all dexdec plugins (public field dfa of optimizers sub-classing AbstractDOptimizer):

...
analyzeChains();  // initialize the `dfa` member field
Long defaddr = dfa.checkSingleDef(insnAddress, varid);  // use-def chains
...

The improved plugin can be found here: DOptUnsafeArrayAccessSubstV3.java

The obfuscated code is now processed as expected, and dexdec generates the following decompilation:

initArray() is deobfuscated by DOptUnsafeArrayAccessSubstV3

Conclusion and Future Work

dProtect is a great project to provide code obfuscation for the masses. Its compatibility with ProGuard makes integration into new and existing Android projects a breeze. I have little doubt many developers will try it out in the future. Let’s see how upcoming upgrades to the obfuscators fare against the decompiler!

In future blogs, we will have a look at dProtect’s control-flow obfuscation (once I’ve got it to work!) and we will see how O-MVLL, the LLVM-based native code obfuscator counterpart, does against JEB’s gendec (generic decompiler for native code).

Until next time! – Nicolas

Dart AOT snapshot helper plugin to better analyze Flutter-based apps

Update: Oct 18 2023: as of JEB 5.4, JEB can parse Dart AOT snapshots version 2.10 to 3.1.

Update: Oct 5 2022: as of JEB 4.20, this plugin generates IDartAotUnit objects, easily accessible by API.

A “Dart AOT Snapshot” unit generated by the plugin (JEB >= 4.20), along with unit documents.

The original post can be found below:

JEB 4.17 ships with a Dart AOT (ahead-of-time) binary snapshot helper plugin to help with the analysis of pre-compiled Dart programs. A common use case for it may be to offer directions when reverse engineering Flutter apps compiled for Android x86/x64 or arm/aarch64 platforms.

Snapshots in ELF

Release-mode Flutter-based Android apps will generate AOT snapshots instead of shipping with bytecode or Dart code, like Debug-mode apps may choose to. The AOT snapshot contains a state of the Dart VM required to run the pre-compiled code.

A snapshot is generally located in the lib/<arch>/libapp.so files of an APK. Since Dart may be used outside of Flutter, or since the file name or location may change, a reliable way to locate such files is to look for an ELF so exporting the following 4 symbols:

_kDartVmSnapshotInstructions
_kDartIsolateSnapshotInstructions
_kDartVmSnapshotData
_kDartIsolateSnapshotData

The XxxSnapshotInstructions symbols point to pre-compiled machine code. However, getting a starting point when dealing with stripped or obfuscated binaries may prove difficult. The XxxSnapshotData symbols point to Dart VM structures and objects that will be accessed by the executing code. That includes data elements such as pooled strings or arrays of immediate values. Snapshot data also include important metadata that will help restructure the hundreds or thousands of routines compiled in an AOT snapshot.

Using the Plugin

First, make sure that you are dealing Dart AOT snapshots or with a Flutter app containing precompiled AOT snapshots. Indeed other types of snapshots exist, such as JIT snapshots. The plugin does not provide help for those. In practice, non-AOT snapshots may be relatively easy to analyze, but you are unlikely to encounter them in the wild. Most Dart code or Flutter apps will be compiled and distributed in release mode. At best, some symbols and optional metadata may be left over. At worst, most will have been obfuscated (refer to Flutter’s --obfuscate option).

The plugin will automatically kick in and analyze AOT snapshots generated by Dart 2.10 (~Fall 2010) to Dart 2.17 (current at the time of writing). The analysis results will be placed in text sub-units located under the elf container unit. The code unit will be annotated (methods will be renamed, etc.), as explained in the next sections.

An aarch64 ELF file containing Dart AOT snapshots. The plugin generated reports in the dart_aot_snapshots sub-unit folder. Other information would be embedded into the native code unit itself (e.g. renamed routines, re-packaged routines, extra comments, etc.)is directly placed onto .

Textual Information

AOT snapshots contain lots of information. Deserializing them is relatively complicated, not to mention the fact that each revision of Dart changes the format — meaning that support will have to be added for Dart 2.18+ when that version ships… The plugin does not extract every potentially available bit of information. What is made available at this time is:

1- Basic information about the snapshots, such as version and features

Basic information about AOT snapshots

2- The list of libraries, classes, and methods

Classes, methods, libraries present a snapshot. Here, we can see that most names were obfuscated.

3- A view of the primary pool strings

Pooled items (including strings), some of them may be used by the natively executed code.

Code Annotations

Aside from static information, the plugin also attempts to:

1- Rename methods. Release builds will strip the method names from the ELF file. However, the AOT snapshot information references all AOT methods as well as their names, classes, library, etc. The names provided in the snapshot information will be applied to unnamed native routines.

You will be able to locate the main method, the entry-point of all Dart applications.

2- Annotate access to pooled strings. Native code accesses pooled items through a fixed register (containing an address into a pointer array to pooled elements). Below is a list of registers for the most common architectures:

arm     : register r5
aarch64 : register x27
x64     : register r15

Pooled strings accessed on x64 binaries are marked as a meta-comment in the code unit, as follows:

0x1BFF / 8 (pointer size on 64-bit arch.) = 0x37F = 895

Unfortunately, due to how the assembly code for arm64 binaries is generated, those comments cannot be generated on such binaries. However, decompilation will yield slightly more digestible code, e.g.:

Pooled string access on an arm64 binary

Caveats & Conclusion

We recommend analyzing x64 or arm64 binaries, instead of their 32-bit x86 or arm counterparts, since the plugin may not parse everything properly in the latter cases. In particular, the functions are not mapped properly for arm 32-bit snapshots generated by recent versions of Dart (2.16’ish and above).

More could be done, in particular related to calling conventions (for proper decompilation), pseudo-code refactoring and restructuring (via gendec IR plugins for instance), library code flagging (e.g. classes and their methods belonging to dart::<well_known_namespace> could be visually standing out). Such additional features will be added depending on the feedback and the needs of the users. Please let us know your feedback via the usual means (Twitter, email, Slack).

Finally, thanks to Axelle Apvrille (@cryptax) for flagging Dart as something that JEB may be able to help with!

Further Reading

Discussion of the internal formats and binary details of AOT snapshots was out-of-scope in this blog. Readers interested in digging further should check the following resources:

Miscellanous

A generated mapping of Dart snapshot’s version hashes to git version tags can be found here: https://gist.github.com/nfalliere/84803aef37291ce225e3549f3773681b

Thank you for reading, until next time! – Nicolas