Control-flow unflattening in the wild

Both JEB decompiler engines 1 ship with code optimizers capable of rebuilding methods whose control-flow was transformed by flattening obfuscators.

Image © Tigress (University of Arizona)

Control-flow flattening, sometimes referred to as chenxification2, is an obfuscation technique employed to destructure a routine control-flow. While a compiled routine is typically composed of a number of basic blocks having low ingress and egress counts, a flattened routine may exhibit an outlier node having high input and high output edge counts, and generally, a very high centrality in the graph (in terms of vertex betweenness). Practically speaking, the original method M is reduced to a many-way conditional block H evaluating an expression VPC, dispatching the flow of execution to units of code, each one performing a part of M, updating VPC, and looping back to H. In effect, the original structured code is reduced to a large switch-like block, whose execution is guided by a synthetic variable VPC. Therefore, the original flow of control, critical to infer meaning while performing manual reverse-engineering, is lost. 3

We upgraded dexdec‘s control flow unflattener earlier this year. 4 The v2 of the unflattener is more generic than our original implementation. It is able to cover cases in which the obfuscated does not map to the clean model presented above, e.g. cases where the dispatcher stands out.

This week, we encountered an instance of code that was auto-deobfuscated to clean code and thought it’d be a good example to show how useful generic deobfuscation of such code can be. It seems that the obfuscator that was used to protect the original code was BlackObfuscator, a project used by clean apps and malware alike.

Hash: 92ae23580c83642ad0e50f19979b9d2122f28d8b3a9d4b17539ce125ae8d93eb

Before deobfuscation.

After deobfuscation, the code looks like:

After deobfuscation.

If you encounter examples where the unflattener does not perform adequately, please let us know. We’ll see if they can be fixed or upgraded to cover obfuscation corner-cases.

Thank you & until next time — Nicolas.

  1. dexdec is JEB’ dex/dalvik decompiler, gendec is JEB’s generic decompiler used for native code and any code other than dex/dalvik
  2. A term coined by University of Arizona’s Pr. Christian Collberg for the fact that an early description of this technique was presented by Dr. Chenxi Wang in her PhD thesis
  3. Control-flow flattening can be seen as a particular case of code virtualization, which was covered in previous blog entries.
  4. JEB 4.25 released on Jan 17 2023

Published by

Nicolas Falliere

Author of JEB.

Leave a Reply

Your email address will not be published. Required fields are marked *