{"id":3765,"date":"2020-07-08T14:03:18","date_gmt":"2020-07-08T22:03:18","guid":{"rendered":"https:\/\/www.pnfsoftware.com\/blog\/?p=3765"},"modified":"2021-09-10T14:15:24","modified_gmt":"2021-09-10T22:15:24","slug":"reversing-android-protector-virtualization","status":"publish","type":"post","link":"https:\/\/www.pnfsoftware.com\/blog\/reversing-android-protector-virtualization\/","title":{"rendered":"Reversing an Android app Protector, Part 3 &#8211; Code Virtualization"},"content":{"rendered":"\n<p><em>In this series: <a href=\"https:\/\/www.pnfsoftware.com\/blog\/reversing-dexguard\/\">Part 1<\/a>, <a href=\"https:\/\/www.pnfsoftware.com\/blog\/reversing-dexguard-encryption\/\">Part 2<\/a>, Part 3<\/em><\/p>\n\n\n\n<p>The third part of this series is about bytecode virtualization. The analyses that follow were done statically.<\/p>\n\n\n\n<p>Bytecode virtualization is the most interesting and technically challenging feature of this protector.<\/p>\n\n\n\n<p><strong>TL;DR:<br>&#8211; JEB Pro can un-virtualize protected methods.<br>&#8211; A <em>Global Analysis<\/em> (Android menu) will point you to p-code VM routines.<br>&#8211; Make sure to disable <em>Parse Exceptions<\/em> when decompiling such methods.<br>&#8211; For even clearer results, rename opaque predicates of the method to <em>guard0\/guard1<\/em> (refer part 1 of this blog for details)<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What Is Code Virtualization<\/h2>\n\n\n\n<p>Relatively novel, code virtualization is possibly one of the most effective protection technique there is <sup class='footnote'><a href='#fn-3765-1' id='fnref-3765-1' onclick='return fdfootnote_show(3765)'>1<\/a><\/sup>. With it come relatively heavy disadvantages, such as hampered speed of execution <sup class='footnote'><a href='#fn-3765-2' id='fnref-3765-2' onclick='return fdfootnote_show(3765)'>2<\/a><\/sup> and the difficulty to troubleshoot production code. The advantages are heightened reverse-engineering hurdles over other more traditional software protection techniques.<\/p>\n\n\n\n<p>Virtualization in the context of code protection means:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Generating a virtual machine <em>M1<\/em><\/li><li>Translating an original code object <em>C0<\/em> meant to be executed on a machine <em>M0<\/em> <sup class='footnote'><a href='#fn-3765-3' id='fnref-3765-3' onclick='return fdfootnote_show(3765)'>3<\/a><\/sup>, into a semantically-equivalent code object <em>C1<\/em>, to be run on <em>M1<\/em>.<\/li><\/ul>\n\n\n\n<p>While the general features of <em>M1<\/em> are likely to be fixed (e.g., all generations of <em>M1<\/em> are stack machines with such and such characteristics), the Instruction Set Architecture (ISA) of <em>M1<\/em> may not necessarily be. For example, opcodes, microcodes and their implementation may vary from generation to generation. As for <em>C1<\/em>, the characteristics of a generation are only constrained by the capabilities of the converter. Needless to say, standard obfuscation techniques can be applied on <em>C1<\/em>. The virtualization process can possibly be recursive (<em>C1<\/em> could be a VM implementing the specifications of a machine <em>M2<\/em>, executing a code object <em>C2<\/em>, emulating the original behavior of <em>C0<\/em>, etc.).<\/p>\n\n\n\n<p>All in all, in practice, this makes <em>M1<\/em> and <em>C1<\/em> unique and hard to reverse-engineer.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"868\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-1024x868.png\" alt=\"\" class=\"wp-image-3779\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-1024x868.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-300x254.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-768x651.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-1536x1301.png 1536w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image.png 1932w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>Before and after virtualization of a code object C0 into C1<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Example of a Protected Method<\/h2>\n\n\n\n<p><em>Note: all identifier names had been obfuscated. They were renamed for clarity and understanding.<\/em><\/p>\n\n\n\n<p>Below, the class <code>VClass<\/code> was found to be &#8220;virtualized&#8221;. A virtualized class means that all non-constructor (all but <code>&lt;init&gt;(*)V<\/code> and <code>&lt;clinit&gt;()V<\/code>) methods were virtualized.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-15.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"605\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-15-1024x605.png\" alt=\"\" class=\"wp-image-3820\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-15-1024x605.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-15-300x177.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-15-768x454.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-15-1536x908.png 1536w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-15.png 1870w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>Interestingly, the constructors were not virtualized<\/figcaption><\/figure>\n\n\n\n<p>The method <code>d(byte[])byte[]<\/code> is virtualized:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>It was converted into an interpreter loop over two large switch constructs that branch on pseudo-code entries stored in the local array <code>pcode<\/code>.<\/li><li>A <code>PCodeVM<\/code> class was added. It is a modified stack-based virtual machine (more below) that performs basic load\/store operations, custom loads\/stores, as well as some arithmetic, binary and logical operations.<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-17.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"439\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-17-1024x439.png\" alt=\"\" class=\"wp-image-3822\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-17-1024x439.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-17-300x129.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-17-768x329.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-17-1536x659.png 1536w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-17.png 1726w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>Virtualized method. Note the pcode array. The opcode handlers are located in two switches. This picture shows the second switch, used to handle specific operations and API calls.<\/figcaption><\/figure>\n\n\n\n<p>A snippet of the p-code VM class. <a href=\"https:\/\/gist.github.com\/nfalliere\/209e1c27e358eb3cd22595b7326409eb\">Full code here<\/a>, also contains the virtualized class.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-18.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-18-640x1024.png\" alt=\"\" class=\"wp-image-3823\" width=\"559\" height=\"894\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-18-640x1024.png 640w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-18-187x300.png 187w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-18-768x1229.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-18-960x1536.png 960w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-18.png 1114w\" sizes=\"auto, (max-width: 559px) 100vw, 559px\" \/><\/a><figcaption>The generic interpreter is called via <code>vm.exec(opcode)<\/code>. Execution falls back to a second switch entry, in the virtualized method, if the operation was not handled.<\/figcaption><\/figure>\n\n\n\n<p>Please refer to the gist linked above for a full list of &#8220;generic&#8221; VM operations. Three examples, including one showing that the operations are not as generic as the term implies:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-19.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"118\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-19-1024x118.png\" alt=\"\" class=\"wp-image-3824\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-19-1024x118.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-19-300x35.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-19-768x89.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-19.png 1480w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>(specific to this VM) opcode 6, used to peek the most recently pushed object<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-20.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"117\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-20-1024x117.png\" alt=\"\" class=\"wp-image-3825\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-20-1024x117.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-20-300x34.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-20-768x88.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-20-1536x176.png 1536w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-20.png 1538w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>(specific to this VM) opcode 8, a push-int operation<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-21.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"146\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-21-1024x146.png\" alt=\"\" class=\"wp-image-3826\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-21-1024x146.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-21-300x43.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-21-768x110.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-21-1536x219.png 1536w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-21.png 2005w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>(specific to this VM) opcode 23 is relatively specialized, it implements an add-xor stack operation (pop, pop, push). It is quite interesting to see that the protection system does not simply generate one-to-one, dalvik-to-VM opcodes. Instead, the target routine is thoroughly analyzed, most likely lifted, high-level (compounded) arithmetic operations isolated, and pseudo-generic (in PCodeVM) or specialized (in the virtualized method) opcodes generated. <\/figcaption><\/figure>\n\n\n\n<p>As said, negative opcodes represent custom operations specific to a virtualized method, including control flow changes. An example:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-22.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"156\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-22-1024x156.png\" alt=\"\" class=\"wp-image-3829\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-22-1024x156.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-22-300x46.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-22-768x117.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-22-1536x234.png 1536w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-22.png 1875w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>opcode -25: a <code>if(a &gt;=b) goto LABEL<\/code> operation (first, call into opcode 55 to do a GE operation on the top two integers; then, use the result to do conditional branching)<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Characteristics of the P-code VM<\/h2>\n\n\n\n<p>From the analysis of that code as well as virtualized methods found in other binaries, the characteristics of the p-code VM generated by the app protector can be inferred:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>The VM is a hybrid stack machine that uses 5 parallel stacks of the same height, stored in arrays of:<ul><li>java.lang.Object (accommodating all objects, including arrays)<\/li><li>int (accommodating all small integers, including boolean and char)<\/li><li>long<\/li><li>float<\/li><li>double<\/li><\/ul><\/li><li>For each one of the 5 stack types above, the VM uses two additional registers for storing and loading<\/li><li>Two stack pointers are used: one indicates the stack TOP, the other one seems to be used more liberally, and is akin to a peek register <\/li><li>The stack has a reserved area to store the virtualized method parameters (including <code>this<\/code> if the method is non-static)<\/li><li>The ISA encoding is trivial: each instruction is exactly one-word long, it is the opcode of the p-code instruction to be executed. There is no concept of register, index, or immediate value embedded into the instruction, as most stack machine ISA&#8217;s have.<\/li><li>Because the ISA is so simple, the implementation of the semantics of an instruction falls almost entirely on the p-code handler. For this reason, they were grouped into two categories:<ul><li>Semi-generic VM operations (load\/store, arithmetic, binary, tests) are handled by the VM class and have a positive id. (A VM object is used by every virtualized method in a virtualized class.)<\/li><li>Operations specific to a given virtualized method (e.g., method invocations) use negative ids and are handled within the virtualized method itself.<\/li><\/ul><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">P-code obfuscation: junk insertion, spaghetti code<\/h3>\n\n\n\n<p>While the <code>PCodeVM<\/code> opcodes are all &#8220;useful&#8221;, many specific opcodes of a virtualized method (negative ids) achieve nothing but the execution of code semantically equivalent to <code>NOP<\/code> or <code>GOTO<\/code>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-23.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"268\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-23-1024x268.png\" alt=\"\" class=\"wp-image-3831\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-23-1024x268.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-23-300x79.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-23-768x201.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-23.png 1271w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>opcodes -2, -1: essentially branching instructions. A substantial amount of those can be found, including some branching to blocks with no other input but that source (i.e., an unnecessary GOTO &#8211; =spaghetti code -, or a NOP operation if the next block is the follow.)<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Rebuilding Virtualized Methods<\/h2>\n\n\n\n<p>Below, we explain the process used to rebuild a virtualized method. The CFG&#8217;s presented are IR-CFG&#8217;s (Intermediate Representations) used by the <em>dexdec<\/em> <sup class='footnote'><a href='#fn-3765-4' id='fnref-3765-4' onclick='return fdfootnote_show(3765)'>4<\/a><\/sup> pipeline. Note that unlike <em>gendec<\/em>&#8216;s IR <sup class='footnote'><a href='#fn-3765-5' id='fnref-3765-5' onclick='return fdfootnote_show(3765)'>5<\/a><\/sup>, <em>dexdec<\/em>&#8216;s IR is not exposed publicly, but its textual representation is mostly self-explanatory.<\/p>\n\n\n\n<p>Overall, a virtualized routine, once processed by dexdec like any other routine, looks like the following: A loop over p-code entries (stored in <code>x8<\/code> below), processed by <code>a()<\/code> at 0xE first, or by the large routine switch.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"57\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-3-1024x57.png\" alt=\"\" class=\"wp-image-3801\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-3-1024x57.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-3-300x17.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-3-768x43.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-3-1536x85.png 1536w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-3-2048x113.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>Virtualized method, optimized, virtualized<\/figcaption><\/figure>\n\n\n\n<p>The routine <code>a()<\/code> is PCodeVM.exec(), and its optimized IR boils down to a large single switch. <sup class='footnote'><a href='#fn-3765-6' id='fnref-3765-6' onclick='return fdfootnote_show(3765)'>6<\/a><\/sup><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-4.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"15\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-4-1024x15.png\" alt=\"\" class=\"wp-image-3802\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-4-1024x15.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-4-300x4.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-4-768x11.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-4-1536x22.png 1536w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-4-2048x29.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>PCodeVM.exec()<\/figcaption><\/figure>\n\n\n\n<p>The unvirtualizer needs to identify key items in order to get started, such as the p-code entries, identifiers used as indices into the p-code array, etc. Once they have been gathered, concolic execution of the virtualized routine becomes possible, and allows rebuilding a raw version of the original execution flow. Multiple caveats need to be taken care of, such as p-code inlining, branching, or flow termination. <strong>In its current state, the unvirtualizer disregards exceptional control flow.<\/strong><\/p>\n\n\n\n<p>Below, a raw version of the unflattened CFG. Note that all operations are stack-based;  the code itself has not been modified at this point, it still consists of VM stack-based operations.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-8.png\"><img loading=\"lazy\" decoding=\"async\" width=\"84\" height=\"1024\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-8-84x1024.png\" alt=\"\" class=\"wp-image-3807\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-8-84x1024.png 84w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-8-126x1536.png 126w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-8-168x2048.png 168w\" sizes=\"auto, (max-width: 84px) 100vw, 84px\" \/><\/a><figcaption>Virtualized method after unflattening, raw<\/figcaption><\/figure><\/div>\n\n\n\n<p>dexdec&#8217;s standard IR optimization passes (dead-code removal, constant and variable propagation, folding, arithmetic simplification, flow simplifications, etc.) clean up the code substantially:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-11.png\"><img loading=\"lazy\" decoding=\"async\" width=\"540\" height=\"1024\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-11-540x1024.png\" alt=\"\" class=\"wp-image-3811\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-11-540x1024.png 540w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-11-158x300.png 158w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-11-768x1456.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-11-810x1536.png 810w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-11-1081x2048.png 1081w\" sizes=\"auto, (max-width: 540px) 100vw, 540px\" \/><\/a><figcaption>Virtualized method after unflattening and IR optimizations (opt1)<\/figcaption><\/figure>\n\n\n\n<p>At this stage, all operations are stack-based. The high-level code generated from the above would be quite unwieldy and difficult to analyze, although substantially better than the original double-switch.<\/p>\n\n\n\n<p>The next stage is to analyze stack-based operations to recover stack slots uses and convert them back to identifiers (which can be viewed as virtual registers; essentially, we realize the conversion of stack-based operations into register-based ones). Stack analysis can be done in a variety of ways, for example, using fixed-point analysis. Again, several caveats apply, and the need to properly identify stacks as well as their indices is crucial for this operations. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-12.png\"><img loading=\"lazy\" decoding=\"async\" width=\"485\" height=\"1024\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-12-485x1024.png\" alt=\"\" class=\"wp-image-3812\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-12-485x1024.png 485w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-12-142x300.png 142w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-12-768x1623.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-12-727x1536.png 727w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-12-969x2048.png 969w\" sizes=\"auto, (max-width: 485px) 100vw, 485px\" \/><\/a><figcaption>Virtualized method after unflattening, IR optimizations, VM stack analysis (opt2)<\/figcaption><\/figure>\n\n\n\n<p>After another round of optimizations:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-13.png\"><img loading=\"lazy\" decoding=\"async\" width=\"528\" height=\"1024\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-13-528x1024.png\" alt=\"\" class=\"wp-image-3813\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-13-528x1024.png 528w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-13-155x300.png 155w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-13-768x1489.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-13-792x1536.png 792w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-13-1056x2048.png 1056w\" sizes=\"auto, (max-width: 528px) 100vw, 528px\" \/><\/a><figcaption>Virtualized method after unflattening, IR optimizations, VM stack analysis, IR optimizations (opt2_1)<\/figcaption><\/figure>\n\n\n\n<p>Once the stack analysis is complete, we can replace stack slot accesses by identifier accesses.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-14.png\"><img loading=\"lazy\" decoding=\"async\" width=\"372\" height=\"1024\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-14-372x1024.png\" alt=\"\" class=\"wp-image-3814\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-14-372x1024.png 372w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-14-109x300.png 109w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-14-768x2112.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-14-745x2048.png 745w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-14.png 1572w\" sizes=\"auto, (max-width: 372px) 100vw, 372px\" \/><\/a><figcaption>Virtualized method after unflattening, IR optimizations, VM stack analysis, IR optimizations, virtual registers insertion  (opt3)<\/figcaption><\/figure>\n\n\n\n<p>After a round of optimizations:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-10.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"266\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-10-1024x266.png\" alt=\"\" class=\"wp-image-3809\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-10-1024x266.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-10-300x78.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-10-768x200.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-10-1536x399.png 1536w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-10-2048x532.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>Virtualized method after unflattening, IR optimizations, VM stack analysis, IR optimizations, virtual registers insertion, IR optimizations (opt3)<\/figcaption><\/figure>\n\n\n\n<p>At this point, the &#8220;original&#8221; CFG is essentially reconstructed, and other advanced deobfuscation passes (e.g., emulated-based deobfuscators) can be applied.<\/p>\n\n\n\n<p>The high-level code generation yields a clean, unvirtualized routine:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-24.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"545\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-24-1024x545.png\" alt=\"\" class=\"wp-image-3837\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-24-1024x545.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-24-300x160.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-24-768x409.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-24.png 1383w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>High-level code, unvirtualized, unmarked<\/figcaption><\/figure>\n\n\n\n<p>After reversing, it appears to be a modified RC4 algorithm. Note the +3\/+4 added to the key.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-25.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"486\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-25-1024x486.png\" alt=\"\" class=\"wp-image-3838\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-25-1024x486.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-25-300x142.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-25-768x364.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-25-1536x729.png 1536w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/07\/image-25.png 1583w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>High-level code, unvirtualized, marked<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Detecting Virtualized Methods<\/h2>\n\n\n\n<p>All versions of JEB detect virtualized methods and classes: run <em>Global Analysis<\/em> (GUI menu: <em>Android<\/em>) on your APK\/DEX and look for those special events:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/06\/image-32.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"143\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/06\/image-32-1024x143.png\" alt=\"\" class=\"wp-image-3748\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/06\/image-32-1024x143.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/06\/image-32-300x42.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/06\/image-32-768x107.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/06\/image-32-1536x215.png 1536w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/06\/image-32.png 2025w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>dexdec event:<br><em>&#8220;Found Virtualized routine handler<\/em> <em>(P-Code VM)&#8221;<\/em><\/figcaption><\/figure>\n\n\n\n<p><em>JEB Pro<\/em> version 3.22 <sup class='footnote'><a href='#fn-3765-7' id='fnref-3765-7' onclick='return fdfootnote_show(3765)'>7<\/a><\/sup> ships with the unvirtualizer module.<\/p>\n\n\n\n<p>Tips:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Make sure to enable the Obfuscators, and enable Unvirtualization (enabled by default in the options).<\/li><li><strong>The try-blocks analysis must be disabled for the class to unvirtualize.<\/strong> (Use MOD1+TAB to redecompile, untick &#8220;Parse Exception Blocks&#8221;).<\/li><li>After a first decompilation pass, it may be easier to identify guard0\/guard1, rename, and recompile, else OP obfuscation will remain and make the code unnecessarily difficult to read. (Refer to part 1 of this series to learn about what renaming those fields to those special names means and does when a protected app is detected.)<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>We hope you enjoyed this third installment on code (un)virtualization.<\/p>\n\n\n\n<p>There may be a fourth and final chapter to this series on native code protection. Until next time!<\/p>\n\n\n\n<p>&#8212;<\/p>\n\n\n<div class='footnotes' id='footnotes-3765'><div class='footnotedivider'><\/div><ol><li id='fn-3765-1'> On a personal note, my first foray into VM-based protection dates back to 2009 with the analysis of Trojan.Clampi, a Windows malware protected with VMProtect <span class='footnotereverse'><a href='#fnref-3765-1'>&#8617;<\/a><\/span><\/li><li id='fn-3765-2'> Although one could argue that with current hardware (fast x64\/ARM64 processors) and software (JIT&#8217;er and AOT compilers), that drawback may not be as relevant as it used to be. <span class='footnotereverse'><a href='#fnref-3765-2'>&#8617;<\/a><\/span><\/li><li id='fn-3765-3'> <em>Machine<\/em> here may be understood as physical machine or virtual machine <span class='footnotereverse'><a href='#fnref-3765-3'>&#8617;<\/a><\/span><\/li><li id='fn-3765-4'> dexdec is JEB&#8217;s dex decompiler engine <span class='footnotereverse'><a href='#fnref-3765-4'>&#8617;<\/a><\/span><\/li><li id='fn-3765-5'> gendec is JEB&#8217;s generic decompilation pipeline <span class='footnotereverse'><a href='#fnref-3765-5'>&#8617;<\/a><\/span><\/li><li id='fn-3765-6'> Note the similarities with CFG flattened by chenxification and similar techniques. One key difference here is that the next block may be determined using the p-code array, instead of a key variable, updated after each operation. I.e., is the FSM &#8211; controlling what the next state (= the next basic block) is &#8211; embedded in the flattened code itself, or implemented as a p-code array. <span class='footnotereverse'><a href='#fnref-3765-6'>&#8617;<\/a><\/span><\/li><li id='fn-3765-7'> <em>JEB Android<\/em> and JEB demo builds do not ship the unvirtualizer module. I initially wrote this module as a proof-of-concept not intended for release, but eventually decided to offer it to our professional users who have legitimate (non malicious) use cases, e.g. code audits and black-box assessments. <span class='footnotereverse'><a href='#fnref-3765-7'>&#8617;<\/a><\/span><\/li><\/ol><\/div>","protected":false},"excerpt":{"rendered":"<p>In this series: Part 1, Part 2, Part 3 The third part of this series is about bytecode virtualization. The analyses that follow were done statically. Bytecode virtualization is the most interesting and technically challenging feature of this protector. TL;DR:&#8211; JEB Pro can un-virtualize protected methods.&#8211; A Global Analysis (Android menu) will point you to &hellip; <a href=\"https:\/\/www.pnfsoftware.com\/blog\/reversing-android-protector-virtualization\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Reversing an Android app Protector, Part 3 &#8211; Code Virtualization<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[15,3,18,5],"tags":[],"class_list":["post-3765","post","type-post","status-publish","format-standard","hentry","category-android","category-decompilation","category-jeb3","category-obfuscation"],"_links":{"self":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/posts\/3765","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/comments?post=3765"}],"version-history":[{"count":0,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/posts\/3765\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/media?parent=3765"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/categories?post=3765"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/tags?post=3765"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}