{"id":1001,"date":"2019-01-23T11:13:08","date_gmt":"2019-01-23T19:13:08","guid":{"rendered":"https:\/\/www.pnfsoftware.com\/blog\/?p=1001"},"modified":"2019-01-23T11:13:08","modified_gmt":"2019-01-23T19:13:08","slug":"jeb-native-pipeline-ir-optimizers-part-2","status":"publish","type":"post","link":"https:\/\/www.pnfsoftware.com\/blog\/jeb-native-pipeline-ir-optimizers-part-2\/","title":{"rendered":"JEB Native Analysis Pipeline \u2013 Part 2: IR Optimizers"},"content":{"rendered":"\n<p>In <a href=\"https:\/\/www.pnfsoftware.com\/blog\/jeb-native-pipeline-intermediate-representation\/\">part 1<\/a> of this series, we gave an overview of the Intermediate Representation used by JEB&#8217;s Native Analysis Pipeline, as well as a simple Python script demonstrating how to use the API to access and print out IR-CFG of decompiled routines.<\/p>\n\n\n\n<p>In part 2, we continue our exploration of JEB IR. We will show how to write a custom IR optimizer plugin to clean-up a custom obfuscation used in a piece of code. The resulting decompiled C code will end up very readable as well.<\/p>\n\n\n\n<p><strong>Before you proceed, make sure to update JEB Pro to version 3.1.1+.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Obfuscated Crypto-stealer Code<\/h2>\n\n\n\n<p><a href=\"https:\/\/github.com\/pnfsoftware\/jeb-native-ir-optimizer-example1\/blob\/master\/samples\/1.zip\">The sample<\/a> we are going to look at monitors Windows clipboards for cryptocurrency-looking wallet addresses, and replaces them with a desired target address. The sample is specifically targeting Ethereum wallet addresses. It is a neutered final stage payload &#8211; the recipient address has been scrambled to render the code ineffective.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"698\" height=\"516\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-4.png\" alt=\"\" class=\"wp-image-1018\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-4.png 698w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-4-300x222.png 300w\" sizes=\"auto, (max-width: 698px) 100vw, 698px\" \/><figcaption>PE characteristics of file 1.exe<\/figcaption><\/figure>\n\n\n\n<p>Although the payload is unpacked, what is interesting is that one of its key routines is obfuscated: custom garbage code was inserted.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"722\" height=\"356\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-5.png\" alt=\"\" class=\"wp-image-1019\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-5.png 722w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-5-300x148.png 300w\" sizes=\"auto, (max-width: 722px) 100vw, 722px\" \/><figcaption>Junk (useless) assignments<\/figcaption><\/figure>\n\n\n\n<p>The garbage code is easy to go through: a bit of manual analysis shows that junk instructions are assigning pseudo-random values to an array whose bytes are never used. Two types of assembly patterns are present:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">1- mov dword ptr [edi + offset], junk_value ; edi previously init. to<br>                                            ; junkarray address<br>2- push junk_value<br>   pop dword ptr [junkarray_address + offset]<\/pre>\n\n\n\n<p>If we decompile that code and look at the final IR (as shown below), we can see that those instructions ended up being converted and optimized to the following type of assignment:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Assign(Mem(mem_address), Imm(junk_value))<\/pre>\n\n\n\n<p>Currently, the decompiled code looks like the following, hard-to-digest blob:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"856\" height=\"457\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-6.png\" alt=\"\" class=\"wp-image-1020\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-6.png 856w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-6-300x160.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-6-768x410.png 768w\" sizes=\"auto, (max-width: 856px) 100vw, 856px\" \/><figcaption>Snippet of decompiled code (obfuscated)<\/figcaption><\/figure>\n\n\n\n<p>Although quite painful to read, we can follow the program&#8217;s logic by abstracting away the junk assignments. (Essentially, win32 functions&#8217; OpenClipboard, GetClipboardData, and SetClipboardData are used to retrieve, check, and replace copy-pasted Ascii and Unicode text, if they match the following pattern &#8220;\/0x(..){20}\/&#8221;. The replacement string target wallet address, previously decrypted by sub_401000.)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Cleaning the Intermediate Representation<\/h2>\n\n\n\n<p>Recall that the native analysis pipeline can be simplifed as the workflow below:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CodeObject (*)<br>  -> Reconstructed Routines &amp; Data<br>    -> Conversion to IR (low-level, non-optimized)<br>      -> <strong><u>IR Optimizations<\/u><\/strong> <em>&lt;--- this is where we'll work<\/em><br>        -> Final IR (higher-level, optimized, typed) <br>          -> Generation of AST<br>            -> AST Optimizations<br>              -> Final AST (final, cleaned) <br>                -> High-level output (eg, C variant)<\/pre>\n\n\n\n<p>Our custom IR optimizer will look for junk assignments and remove them. The important criteria are: What is the junk array start and end addresses? Is it common to all routines in the binary, or is there one array per routine? Those questions may be hard to answer in the general case. However, for our specific sample file, we can assert with a high-degree of certainty that the junk array:<br>&#8211; starts at address 0x415882<br>&#8211; is at most 256 bytes long<br>&#8211; is used solely by sub_401171, the routine we want to analyze<\/p>\n\n\n\n<p>Because of the above restrictions, the IR optimizer we are going to write should be qualified as a <em>custom<\/em> or <em>ad-hoc<\/em> IR optimizer. Chances are, we won&#8217;t be able to reuse it as-is in other programs without some amount of tweaking.<\/p>\n\n\n\n<p>Let&#8217;s get started, we will:<br>&#8211; create an Eclipse project with scaffold code for a Java back-end plugin<br>&#8211; write and test a custom IR optimizer with a headless client<br>&#8211; deploy the plugin and make it usable and accessible from the UI desktop client<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Creating a Plugin Project<\/h3>\n\n\n\n<p>Before we proceed, make sure to:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Define an environment variable <strong>JEB_HOME<\/strong>, that points to your JEB installation folder<\/li><li>Install <a href=\"https:\/\/www.eclipse.org\/downloads\/packages\/release\/2018-12\/r\/eclipse-ide-java-developers\">Eclipse IDE<\/a><\/li><\/ul>\n\n\n\n<p>Then:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Clone the <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-native-ir-optimizer-example1\">jeb-native-ir-optimizer-example1 repository<\/a>.<\/li><li>Create an Eclipse project by running:<br><ul><li>On Windows: create-eclipse-project.cmd<\/li><li>On Linux\/macOS: create-eclipse-project.sh<\/li><\/ul><\/li><li>Open Eclipse and import the newly-created project into your Workspace (<em>File<\/em>, <em>Import<\/em>, <em>Existing Projects into the Workspace<\/em>, select the cloned repository folder, proceed)<\/li><\/ul>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"709\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-8-1024x709.png\" alt=\"\" class=\"wp-image-1022\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-8-1024x709.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-8-300x208.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-8-768x532.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-8.png 1048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Importing an existing project into Eclipse<\/figcaption><\/figure><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Debugging the Obfuscation<\/h3>\n\n\n\n<p>Now that your project is imported in Eclipse, you should be able to see two source files in src&#8217;s default package:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Tester.java<\/li><li>EOptExample1.java<\/li><\/ul>\n\n\n\n<p><em>EOptExample1<\/em> is the IR optimizer plugin we will be working on. (Note that several classes of plugins exist, this one is a native IR optimizer, and therefore inherits from <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ir\/opt\/AbstractEOptimizer.html\">AbstractEOptimizer<\/a> or one its subclasses.)<\/p>\n\n\n\n<p><em>Tester<\/em> creates a headless JEB instance that loads the plugin EOptExample1.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"484\" height=\"480\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-9.png\" alt=\"\" class=\"wp-image-1023\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-9.png 484w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-9-150x150.png 150w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-9-300x298.png 300w\" sizes=\"auto, (max-width: 484px) 100vw, 484px\" \/><figcaption>Package Explorer view of the newly-created project<\/figcaption><\/figure><\/div>\n\n\n\n<p>Tester.java does the following:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Create a JEB instance <sup class='footnote'><a href='#fn-1001-1' id='fnref-1001-1' onclick='return fdfootnote_show(1001)'>1<\/a><\/sup><\/li><li>Load the test plugin EOptExample1<\/li><li>Then, create a JEB project and load the artifact file samples\/1.exe (IMPORTANT: unzip 1.zip to 1.exe first &#8211; password: <em>password<\/em>)<\/li><li>Analyze the artifact<\/li><li>Retrieve a handle on the native decompiler<\/li><li>Retrieve a handle on the to-be-analyzed routine sub_401171<\/li><li>Perform a full decompilation of that routine<\/li><\/ul>\n\n\n\n<p>Let&#8217;s have a preliminary look at EOptExample1: This IR <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/opt\/OptimizerType.html\">optimizer type<\/a> is set to STANDARD, which is not ideal when you use custom optimizers tailored for specific code. A better IR optimizer class for those is ON_DEMAND: those optimizers are to be manually invoked, e.g. from JEB UI (menu: <em>File<\/em>, <em>Advanced Unit Options<\/em>). However, during development, since we are focusing on a particular file and routine, STANDARD type may be fine. Standard optimizers are called during regular IR optimization phases of the decompilation pipeline.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: java; title: ; notranslate\" title=\"\">\npublic class EOptExample1 extends AbstractEOptimizer {\n\n    public EOptExample1() {\n        super(DataChainsUpdatePolicy.UPDATE_IF_OPTIMIZED);\n        getPluginInformation().setName(&quot;Sample IR Optimizer #1&quot;);\n        getPluginInformation().setDescription(&quot;Remove IR-statements reduced to \\&quot;*(&amp;garbage + delta) = xxx\\&quot;&quot;);\n        getPluginInformation().setVersion(Version.create(1, 0, 0));\n        \/\/ Standard optimizers are normally run, as part of the IR optimization stages in the decompilation pipeline\n        setType(OptimizerType.STANDARD);\n    }\n\n    \/\/ replace all IR statements previously reduced to EMem (&quot;&#x5B;junk_address] = xxx&quot;) to ENop\n    @Override\n    public int perform(boolean updateDFA) {\n        logger.info(&quot;IR-CFG before running custom optimizer \\&quot;%s\\&quot;:\\n%s&quot;, getName(),\n                DecompilerUtil.formatIRCFGWithContext(2, cfg, ectx));\n        \/\/ ...\n        \/\/ optimizer code\n    }\n}\n<\/pre><\/div>\n\n\n<p>Note the plugin&#8217;s data-chains update policy, set to <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ir\/opt\/DataChainsUpdatePolicy.html\">UPDATE_IF_OPTIMIZED<\/a>. Optimizations that specify this flag tell their runner, aka the master optimizer that orchestrate them, that identifiers may be modified &#8211; hence, if optimizations occurred, a data flow analysis (DFA) pass needs to take place again. DFA update policies are a topic for another article.<\/p>\n\n\n\n<p> Lines 3-5 are plugin metadata information, such as name and description, authorship, version numbers (including minimum\/maximum JEB back-end versions), etc.<\/p>\n\n\n\n<p>Before we deep-dive into perform(), let&#8217;s first set a breakpoint on line 15, where logger.info(&#8230;) is called. Then, start a debugging session for Tester: menu <em>Run<\/em>, command <em>Debug<\/em> (hotkey: F11.)<\/p>\n\n\n\n<p>After a few seconds of analysis, your breakpoint should be hit; it corresponds to the first-time invocation of your custom optimizer. The logger prints out the IR-CFG that&#8217;s about to be optimized. Let&#8217;s have a look at it:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nIR-CFG before running custom optimizer &quot;Sample IR Optimizer #1&quot;:\n&gt;&gt; IN(@0): ecx={@D} esp={@0} ebp={@1} ss={@1,@C,@18,@1D,@21,@24,@25,@27,@30,@35,@38,@3B,@3E,@3F,@41,@43,@46,@4F,@51,@54,@56,@59,@5C,@5D,@5F,@6B,@77,@81,@84,@9B,@9E,@A0,@AC,@B8,@BA,@BD,@BF,@C3,@C5,@C7,@CB,@CD,@D1,@D2,@D4,@E0,@E9,@EF,@F1,@F5,@F7,@FB,@FC,@FE,@100,@103,@106,@107,@109,@10C,@10E,@112,@114,@116,@11A,@11C,@11F,@122,@123,@12E,@131,@133,@137,@139,@13D,@13E,@140,@143,@145,@149,@14B,@14F,@150,@152,@15D,@173,@176,@179,@17C,@17D,@17F,@181,@18A,@18C,@18F,@191,@194,@196,@19A,@19D,@19E,@1A0,@1B3,@1BF,@1C2,@1D9,@1DC,@1E0,@1EC,@1EF,@1F1,@1F3,@1F7,@1F9,@1FC,@1FF,@202,@203,@205,@211,@21D,@220,@222,@226,@227,@229,@22B,@22E,@231,@232,@234,@237,@23A,@23C,@23E,@242,@244,@246,@24A,@24C,@250,@251,@25C,@25F,@262,@263,@265,@268,@26A,@26E,@271,@272,@27A,@27F,@295,@298,@29A,@29D,@2A4,@2A9,@2AD,@2B0,@2B2,@2B6,@2B7,@2BA,@2BC,@2C0,@2C2,@2C6,@2C7,@2CA,@2CD,@2D2,@2DE,@2E4,@2E7,@2E8,@2EA} ds={@F,@11,@19,@1E,@22,@28,@31,@36,@39,@3C,@40,@44,@4C,@4E,@52,@55,@57,@5A,@60,@6C,@74,@78,@7B,@82,@85,@86,@87,@8E,@90,@92,@9C,@A1,@A3,@A4,@AD,@B5,@B7,@BB,@C0,@C4,@C8,@CE,@D5,@E1,@EA,@ED,@F2,@F8,@FD,@FF,@101,@104,@10A,@10F,@113,@117,@11B,@11D,@120,@124,@12F,@134,@13A,@13F,@141,@146,@14C,@153,@15A,@15C,@163,@165,@166,@167,@170,@171,@174,@177,@17A,@17E,@180,@187,@189,@18D,@192,@197,@19B,@1A1,@1AB,@1B4,@1B7,@1B9,@1C0,@1C3,@1C4,@1C5,@1CC,@1CE,@1D0,@1DA,@1DD,@1DE,@1E1,@1E9,@1ED,@1F0,@1F4,@1FA,@1FD,@200,@206,@212,@219,@21B,@21E,@223,@228,@22A,@22C,@22F,@235,@238,@23B,@23F,@243,@247,@24D,@252,@25B,@25D,@260,@264,@266,@26B,@26F,@273,@27B,@27E,@285,@287,@288,@289,@292,@293,@296,@29B,@2A5,@2AA,@2AE,@2B3,@2B8,@2BD,@2C3,@2C8,@2CE,@2D0,@2D3,@2DF,@2E1,@2E2,@2E5,@2EB} OpenClipboard={@25} GetClipboardData={@3F,@17D} GlobalAlloc={@FC,@227} GlobalLock={@107,@232} GlobalUnlock={@13E,@263} SetClipboardData={@150,@272,@2B7,@2C7} CloseClipboard={@2CB} Sleep={@2E8} sub_401000={@D} sub_405010={@5D} sub_404F80={@D2} sub_4024E0={@123,@251} sub_404E54={@19E} sub_404E14={@203} \n0000\/1&gt;  s32:_esp = (s32:_esp - i32:00000004h)                                                                 DU: esp={@1,@2,@B}                 | UD: esp={} \n0001\/1:  32&amp;lt;s16:_ss&gt;&#x5B;s32:_esp] = s32:_ebp                                                                      DU:                                | UD: esp={@0} ebp={} ss={} \n0002\/9:  s32:_ebp = s32:_esp                                                                                   DU: ebp={@38,@41,@46,@4F,@54,@56,@84,@9E,@B8,@BD,@C5,@FE,@100,@10C,@114,@11C,@131,@140,@15D,@176,@17F,@181,@18A,@18F,@194,@1C2,@1DC,@1EF,@1F1,@1FC,@229,@22B,@237,@23C,@244,@25C,@265,@27F,@298,@29D}  | UD: esp={@0} \n000B\/1:  s32:_esp = (s32:_esp - i32:0000002Ch)                                                                 DU: esp={@C,@D,@17}                | UD: esp={@0} \n000C\/1:  32&amp;lt;s16:_ss&gt;&#x5B;s32:_esp] = i32:0040117Ch                                                                 DU:                                | UD: esp={@B} ss={} \n000D\/1:  call s32:_sub_401000(s32:_ecx)-&gt;(s32:_eax){32&#x5B;s32:_esp]}                                              DU: eax={}                         | UD: ecx={} esp={@B} sub_401000={} \n000E\/1+  s32:_edi = i32:00415882h                                                                              DU: edi={}                         | UD: \n000F\/1:  32&amp;lt;s16:_ds&gt;&#x5B;i32:00415944h] = i32:E2E60682h                                                            DU:                                | UD: ds={} \n0010\/1:  s32:_eax = i32:00000001h                                                                              DU: eax={}                         | UD: \n0011\/6:  32&amp;lt;s16:_ds&gt;&#x5B;i32:00415904h] = i32:7C64C0E4h                                                            DU:                                | UD: ds={} \n0017\/1:  s32:_esp = (s32:_esp - i32:00000004h)                                                                 DU: esp={@18,@1A}                  | UD: esp={@B,@2EC} \n0018\/1:  32&amp;lt;s16:_ss&gt;&#x5B;s32:_esp] = i32:E87A1612h                                                                 DU:                                | UD: esp={@17} ss={} \n0019\/1:  32&amp;lt;s16:_ds&gt;&#x5B;i32:004158DDh] = i32:E87A1612h                                                            DU:                                | UD: ds={} \n001A\/1:  s32:_esp = (s32:_esp + i32:00000004h)                                                                 DU: esp={@1C}                      | UD: esp={@17} \n001B\/1:  nop                                                                                                   DU:                                | UD: \n001C\/1+  s32:_esp = (s32:_esp - i32:00000004h)                                                                 DU: esp={@1D,@20}                  | UD: esp={@1A} \n001D\/1:  32&amp;lt;s16:_ss&gt;&#x5B;s32:_esp] = i32:CCA4A4A0h                                                                 DU:                                | UD: esp={@1C} ss={} \n001E\/2:  32&amp;lt;s16:_ds&gt;&#x5B;i32:004158CAh] = i32:CCA4A4A0h                                                            DU:                                | UD: ds={} \n0020\/1:  s32:_esp = s32:_esp                                                                                   DU: esp={@21,@23}                  | UD: esp={@1C} \n0021\/1:  32&amp;lt;s16:_ss&gt;&#x5B;s32:_esp] = i32:00000000h                                                                 DU:                                | UD: esp={@20} ss={} \n0022\/1:  32&amp;lt;s16:_ds&gt;&#x5B;i32:00415951h] = i32:249E4228h                                                            DU:                                | UD: ds={} \n0023\/1:  s32:_esp = (s32:_esp - i32:00000004h)                                                                 DU: esp={@24,@25,@26}              | UD: esp={@20} \n0024\/1:  32&amp;lt;s16:_ss&gt;&#x5B;s32:_esp] = i32:004011CAh                                                                 DU:                                | UD: esp={@23} ss={} \n0025\/1:  call s32:_OpenClipboard(32&amp;lt;s16:_ss&gt;&#x5B;(s32:_esp + i32:00000004h)])-&gt;(s32:_eax){32&#x5B;s32:_esp]}            DU: eax={@33}                      | UD: esp={@23} ss={} OpenClipboard={} \n...\n... (trimmed)\n...\n<\/pre><\/div>\n\n\n<p>The above IR listing is a human-friendly representation of IR statements. The general format of this listing is:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">cnt   what<br><strong>1     >> IN(@EntryOffset){live_inputs}<br>1*    offset\/lengthC &lt;insn> | DU:&lt;def-use-chains> UD:&lt;use-def-chains><br>0+    &lt;&lt; OUT(@ExitOffset){reaching_outputs}<\/strong><br><br>- offset: IR statement offset<br>- length: IR statement length (generally, 1)<br>- C: indicates whether the instruction is<br>    - the entry-point instruction (>)<br>    - the first of a basic-block (+)<br>    - any other instruction (:)<br>- insn: IR statement instruction (refer to Part 1 of this blog series)<br>- DU\/UD: routine def-use and use-def chains<br>- IN: live input variables at the entry-point<br>- OUT: reaching output variables at a given exit point<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"693\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-10-1024x693.png\" alt=\"\" class=\"wp-image-1029\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-10-1024x693.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-10-300x203.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-10-768x519.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-10.png 1248w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>We breakpoint&#8217;ed on logger.info(), and single-stepped one line. The output can be seen in the console view. It may be better (depending on how large your console buffer is) to examine the full output dumped to jeb-plugin-tester.log in your Temp folder.<\/figcaption><\/figure>\n\n\n\n<p>The IR listing is relatively readable, although quite verbose at this early stage of optimization (roughly, the first pass in tier 1 of the analysis pipeline).  The important idioms to look at here are:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"613\" height=\"315\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-11.png\" alt=\"\" class=\"wp-image-1030\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-11.png 613w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-11-300x154.png 300w\" sizes=\"auto, (max-width: 613px) 100vw, 613px\" \/><figcaption>Preliminary conversion of low-level junk inserts<\/figcaption><\/figure>\n\n\n\n<p>a\/ The first one is an Assign(Mem(Imm), Imm), which corresponds to optimized &#8220;mov [edi + offset], value&#8221;, where the value of edi was determined, propagated further, and the addition folded and converted to an immediate address.<\/p>\n\n\n\n<p>b\/ The second one is a partially optimized &#8220;push value \/ pop [address]&#8221;. Later optimizations phases will find and remove esp updates or esp-based operations, as was shown in the pseudo-code earlier. What we need to focus on here is the Assign(Mem(Imm), Imm), like the one in a\/.<\/p>\n\n\n\n<p>Those are the bits we will look for and modify: Assuming those assignments are useless, we will simply replace them by <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ir\/IENop.html\">Nop<\/a> statements.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Writing the Optimizer<\/h3>\n\n\n\n<p>At this point, our preliminary understanding of the obfuscation is enough to start writing the clean-up optimizer. Its code is extremely simple, for two main reasons:<br>&#8211; The obfuscation scheme itself is relatively trivial<br>&#8211; Other built-in JEB optimizers are giving us clean IR assignments to work on<\/p>\n\n\n\n<p>Let&#8217;s look at the code of proceed():<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: java; title: ; notranslate\" title=\"\">\n    @Override\n    public int perform(boolean updateDFA) {\n        final long garbageStart = 0x415882;\n        final long garbageEnd = garbageStart + 0x100;        \n        int cnt = 0;\n        for(int iblk = 0; iblk &lt; cfg.size(); iblk++) {\n            BasicBlock&lt;IEStatement&gt; b = cfg.get(iblk);\n            for(int i = 0; i &lt; b.size(); i++) {\n                IEStatement stm = b.get(i);\n                if(!(stm instanceof IEAssign)) {\n                    continue;\n                }\n                IEAssign asg = (IEAssign)stm;\n                if(!(asg.getLeftOperand() instanceof IEMem)) {\n                    continue;\n                }\n                IEMem target = (IEMem)asg.getLeftOperand();\n                if(!(target.getReference() instanceof IEImm)) {\n                    continue;\n                };\n                IEImm wraddr = (IEImm)target.getReference();\n                if(!wraddr.canReadAsAddress()) {\n                    continue;\n                }\n                long addr = wraddr.getValueAsAddress();\n                if(addr &lt; garbageStart || addr &gt;= garbageEnd) {\n                    continue;\n                }\n                b.set(i, ectx.createNop(stm));\n                cnt++;\n            }\n        }\n        return postPerform(updateDFA, cnt);\n    }\n<\/pre><\/div>\n\n\n<p>This optimizer inherits from <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ir\/opt\/AbstractEOptimizer.html\">AbstractEOptimizer<\/a>. Therefore, the perform() method works on an IR-CFG. (Not all optimizers may choose to do so; it is sometimes easier to work directly on <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ir\/opt\/AbstractEStatementOptimizer.html\">statements<\/a> or <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ir\/opt\/AbstractEExpressionOptimizer.html\">expressions<\/a>.)<\/p>\n\n\n\n<p>process() goes through all statements or every basic block of the IR-CFG. Using the <em>instanceof<\/em> operator, we check that the statement is an <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ir\/IEAssign.html\">assignment<\/a> such as: <em>Mem(address) = Imm<\/em>. The address is retrieved, and we make sure that it falls within the junk array. If those checks succeed, we replace the assignment by a Nop.<\/p>\n\n\n\n<p>And that is it. Clean and simple &#8211; although, not quite portable, since the junk array address and size are hard-coded into the code! But that is not the point of this blog, and neither is portability a first-class goal when writing optimizers for custom code.<\/p>\n\n\n\n<p>Next up, let&#8217;s see how to use the plugin in an interactive session using the desktop client.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Building, Deploying, Interactive Use<\/h3>\n\n\n\n<p>In order to use the optimizer within the JEB desktop client, we either:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Register the plugin as a development plugin;<\/li><li>Or build the plugin as a Jar and drop it in JEB&#8217;s coreplugins\/ folder.<\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Development Plugin<\/h4>\n\n\n\n<p>This is the easiest option. You may consider it as an intermediate step between prototyping with the headless client, as demonstrated above, and a full-blown, deployed Jar plugin.<\/p>\n\n\n\n<p>Open the <em>Options<\/em> panel, <em>Development<\/em> tab, tick the option &#8220;Development Mode&#8221;, add the bin\/ folder of your plugin&#8217;s project to the classpath, and add the classname of your plugin entry-point:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"775\" height=\"569\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-12.png\" alt=\"\" class=\"wp-image-1036\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-12.png 775w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-12-300x220.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-12-768x564.png 768w\" sizes=\"auto, (max-width: 775px) 100vw, 775px\" \/><figcaption>Setting up a development plugin in JEB UI<\/figcaption><\/figure>\n\n\n\n<p>Press OK and restart JEB. Your plugin will be loaded and ready to use. You may now skip to the section &#8220;Using the IR optimizer plugin&#8221;.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Building a Jar plugin<\/h4>\n\n\n\n<p>The alternative is to run build.cmd (on Windows) or build.sh (on Linux\/macOs), which calls an Ant script in the scripts\/ folder, therefore, make sure to have <a href=\"https:\/\/ant.apache.org\/bindownload.cgi\">Ant installed<\/a> on your system first. You may also customize the plugin name and version before building.<\/p>\n\n\n\n<p>The resulting Jar plugin file will be generated in your project&#8217;s out\/ folder. Copy it to your JEB coreplugins\/ folder and start the JEB client. Your plugin will be automatically loaded, along with the other plugins.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Using the IR Optimizer Plugin<\/h4>\n\n\n\n<p>If your plugin has the type STANDARD (default), then, as explained earlier, it will be invoked by the optimizations&#8217; orchestrator automatically, at various times during the decompilation pipeline. If that&#8217;s the mode you&#8217;d like to choose, make sure that your plugin is generic enough to handle all types of input routines, else you&#8217;re in for some strange surprises if you ever forget to remove it from your coreplugins\/ folder.<\/p>\n\n\n\n<p>An alternative is to convert it to an on-demand plugin:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: java; title: ; notranslate\" title=\"\">\npublic EOptExample1() {\n        super(DataChainsUpdatePolicy.UPDATE_IF_OPTIMIZED);\n        getPluginInformation().setName(&quot;Sample IR Optimizer #1&quot;);\n        getPluginInformation().setDescription(&quot;Remove IR-statements reduced to \\&quot;*(&amp;garbage + delta) = xxx\\&quot;&quot;);\n        getPluginInformation().setVersion(Version.create(1, 0, 0));\n\n        \/\/ Standard optimizers are normally run, as part of the IR optimization stages in the decompilation pipeline\n        \/\/setType(OptimizerType.STANDARD);\n\n        \/\/ alternative (better for production \/ in UI use):\n        setType(OptimizerType.ON_DEMAND);\n        setPreferredExecutionStage(-NativeDecompilationStage.LIFTING_COMPLETED.getId());\n        setPostProcessingActionFlags(PPA_OPTIMIZATION_PASS_FULL);\n    }\n<\/pre><\/div>\n\n\n<p>&#8211; Line 11 makes the optimizer <em>on-demand<\/em>. Users must manually activate it, on specific code.<br>&#8211; Line 12 is recommended for on-demand optimizers: we specify at which point in in the pipeline the plugin should be called.<br>&#8211; Finally, we set some post-processing flags, specifying that a full round of standard optimizations must be performed after our custom optimizer has run: this will allow cleaning up code remnants, and optimize our IR-CFG further &#8211; something made possible after running an optimization pass like this one.<\/p>\n\n\n\n<p>On-demand optimizer plugins show up in the <em>File<\/em>, <em>Advanced Unit Options<\/em> dialog box, that you may bring up when a decompiled routine has the focus:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"832\" height=\"412\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-13.png\" alt=\"\" class=\"wp-image-1038\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-13.png 832w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-13-300x149.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-13-768x380.png 768w\" sizes=\"auto, (max-width: 832px) 100vw, 832px\" \/><figcaption>List of on-demand optimizers managed by a given decompiler instance<\/figcaption><\/figure>\n\n\n\n<p>Tick the optimizer box, press OK. The routine will be re-decompiled.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Clean Code<\/h3>\n\n\n\n<p>Regardless which method you choose, once cleaned up, the IR will allow for better downstream pipeline phases, including typing, AST generation, AST optimizations, etc.<\/p>\n\n\n\n<p>The pseudo-C code has become quite readable:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"915\" height=\"538\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-14.png\" alt=\"\" class=\"wp-image-1039\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-14.png 915w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-14-300x176.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/01\/image-14-768x452.png 768w\" sizes=\"auto, (max-width: 915px) 100vw, 915px\" \/><figcaption>The same decompiled method, after deobfuscation by the custom plugin.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>That is it for part 2. We scratched the surface of IR optimizers (which themselves are a relatively small &#8211; albeit important &#8211; part of the overall decompilation pipeline <sup class='footnote'><a href='#fn-1001-2' id='fnref-1001-2' onclick='return fdfootnote_show(1001)'>2<\/a><\/sup>) but it&#8217;s a good start. I strongly encourage you to experiment and ask your questions on our Slack channel. One ongoing effort right now is to bring the API documentation up to speed in terms of contents and sample code.<\/p>\n\n\n\n<p>In part 3, we will continue exploring IR optimizers. Later on in the series, we will show how to write AST optimizers <sup class='footnote'><a href='#fn-1001-3' id='fnref-1001-3' onclick='return fdfootnote_show(1001)'>3<\/a><\/sup>, how to write decompilation modules, and show how existing decompilers can be cutomized further. Stay tuned!<\/p>\n\n\n<div class='footnotes' id='footnotes-1001'><div class='footnotedivider'><\/div><ol><li id='fn-1001-1'> JEB must have been previously run, at least once: EULA accepted, license key generated, etc. <span class='footnotereverse'><a href='#fnref-1001-1'>&#8617;<\/a><\/span><\/li><li id='fn-1001-2'> The decompilation pipeline is one component of the native analysis pipeline, which is one module, among tens, of the JEB back-end: the <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/packages.html\">public API<\/a> is worth exploring if you&#8217;re into advanced use cases. <span class='footnotereverse'><a href='#fnref-1001-2'>&#8617;<\/a><\/span><\/li><li id='fn-1001-3'> AST generation is one of the very final decompilation phases &#8211; working on the syntax tree serves different purposes than working on the IR <span class='footnotereverse'><a href='#fnref-1001-3'>&#8617;<\/a><\/span><\/li><\/ol><\/div>","protected":false},"excerpt":{"rendered":"<p>In part 1 of this series, we gave an overview of the Intermediate Representation used by JEB&#8217;s Native Analysis Pipeline, as well as a simple Python script demonstrating how to use the API to access and print out IR-CFG of decompiled routines. In part 2, we continue our exploration of JEB IR. We will show &hellip; <a href=\"https:\/\/www.pnfsoftware.com\/blog\/jeb-native-pipeline-ir-optimizers-part-2\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">JEB Native Analysis Pipeline \u2013 Part 2: IR Optimizers<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21,3,18,5],"tags":[],"class_list":["post-1001","post","type-post","status-publish","format-standard","hentry","category-api-jeb3","category-decompilation","category-jeb3","category-obfuscation"],"_links":{"self":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/posts\/1001","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/comments?post=1001"}],"version-history":[{"count":0,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/posts\/1001\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/media?parent=1001"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/categories?post=1001"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/tags?post=1001"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}