{"id":3867,"date":"2021-02-18T17:39:11","date_gmt":"2021-02-19T01:39:11","guid":{"rendered":"https:\/\/www.pnfsoftware.com\/blog\/?p=3867"},"modified":"2021-02-21T04:21:18","modified_gmt":"2021-02-21T12:21:18","slug":"traveling-around-mars-with-c-emulation","status":"publish","type":"post","link":"https:\/\/www.pnfsoftware.com\/blog\/traveling-around-mars-with-c-emulation\/","title":{"rendered":"Traveling Around Mars With C Emulation"},"content":{"rendered":"\n<p><em>Disclaimer: a long time ago in our galaxy, we published part 1 of this blog post; then we decided to wait for the next major release of JEB decompiler before publishing the rest. A year and a half later, JEB 4.0 is finally out! So it is time for us to publish our complete adventure with MarsAnalytica crackme. This time as one blog covering the full story.<\/em><\/p>\n\n\n\n<p>In this blog post, we will describe our journey toward analyzing a heavily obfuscated crackme dubbed &#8220;MarsAnalytica&#8221;, by working with JEB&#8217;s decompiled C code <sup class='footnote'><a href='#fn-3867-1' id='fnref-3867-1' onclick='return fdfootnote_show(3867)'>1<\/a><\/sup>.<\/p>\n\n\n\n<p><strong>To reproduce the analysis presented here, make sure to update JEB to version 4.0+.<\/strong><\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Part 1: Reconnaissance<\/h1>\n\n\n\n<p> MarsAnalytica crackme was created by <a href=\"https:\/\/twitter.com\/0xTowel\">0xTowel<\/a> for <a href=\"https:\/\/nsec.io\/competition\/\">NorthSec CTF 2018.<\/a> The challenge was made public after the CTF with an intriguing presentation by its author: <\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p> My reverse engineering challenge &#8216;MarsAnalytica&#8217; went unsolved at #<strong>nsec18<\/strong> #<strong>CTF<\/strong>. Think you can be the first to solve it? It features heavy #<strong>obfuscation<\/strong> and a unique virtualization design.   <br><\/p><cite><a href=\"https:\/\/twitter.com\/0xTowel\/status\/999435131281723392\">0xTowel<\/a><\/cite><\/blockquote>\n\n\n\n<p>Given that exciting presentation, we decided to use this challenge mainly as a playground to explore and push JEB&#8217;s limits (and if we happen to solve it on the road, that would be great!).<\/p>\n\n\n\n<p class=\"has-light-gray-background-color has-background\">The MarsAnalytica sample analyzed in this blog post is the one available on <a href=\"https:\/\/github.com\/0xTowel\/MarsAnalytica\">0xTowel&#8217;s GitHub<\/a> <sup class='footnote'><a href='#fn-3867-2' id='fnref-3867-2' onclick='return fdfootnote_show(3867)'>2<\/a><\/sup>. Another version seems to be available on <a href=\"https:\/\/ringzer0ctf.com\/challenges\">RingZer0 website<\/a>, called &#8220;MarsReloaded&#8221;.<\/p>\n\n\n\n<p>So, let&#8217;s examine the beast!  The program is a <strong>large x86-64 ELF<\/strong> (around 10.8 MB) which, once executed, greets the user like this:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"641\" height=\"193\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/image.png\" alt=\"\" class=\"wp-image-1064\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/image.png 641w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/image-300x90.png 300w\" sizes=\"auto, (max-width: 641px) 100vw, 641px\" \/><\/figure>\n\n\n\n<p>Inserting a dummy input gives:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"679\" height=\"75\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/image-1.png\" alt=\"\" class=\"wp-image-1065\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/image-1.png 679w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/image-1-300x33.png 300w\" sizes=\"auto, (max-width: 679px) 100vw, 679px\" \/><\/figure>\n\n\n\n<p>It appears we have to find a correct Citizen ID! Now let&#8217;s open the executable in JEB. First, the entry point routine:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_main.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"343\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_main-1024x343.png\" alt=\"\" class=\"wp-image-1113\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_main-1024x343.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_main-300x101.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_main-768x258.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_main.png 1190w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>Entry Point<\/figcaption><\/figure>\n\n\n\n<p>Ok, <a href=\"http:\/\/refspecs.linuxbase.org\/LSB_3.1.0\/LSB-generic\/LSB-generic\/baselib---libc-start-main-.html\">the classic libc entry point<\/a>, now let&#8217;s look at strings and imports:<\/p>\n\n\n\n<figure class=\"wp-block-gallery columns-2 is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\"><ul class=\"blocks-gallery-grid\"><li class=\"blocks-gallery-item\"><figure><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_strings-4.png\"><img loading=\"lazy\" decoding=\"async\" width=\"386\" height=\"740\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_strings-4.png\" alt=\"\" data-id=\"1120\" data-link=\"https:\/\/www.pnfsoftware.com\/blog\/?attachment_id=1120\" class=\"wp-image-1120\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_strings-4.png 386w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_strings-4-156x300.png 156w\" sizes=\"auto, (max-width: 386px) 100vw, 386px\" \/><\/a><figcaption class=\"blocks-gallery-item__caption\">Strings<\/figcaption><\/figure><\/li><li class=\"blocks-gallery-item\"><figure><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_imports-4.png\"><img loading=\"lazy\" decoding=\"async\" width=\"195\" height=\"391\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_imports-4.png\" alt=\"\" data-id=\"1121\" data-link=\"https:\/\/www.pnfsoftware.com\/blog\/?attachment_id=1121\" class=\"wp-image-1121\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_imports-4.png 195w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_imports-4-150x300.png 150w\" sizes=\"auto, (max-width: 195px) 100vw, 195px\" \/><\/a><figcaption class=\"blocks-gallery-item__caption\">Imports<\/figcaption><\/figure><\/li><\/ul><\/figure>\n\n\n\n<p>A few interesting imports: <code>getchar()<\/code> to read user input, and <code>putchar()<\/code> and <code>puts()<\/code> to write. Also, some memory manipulation routines, <code>malloc()<\/code> and <code>memcpy()<\/code>. No particular strings stand out though, not even the greeting message we previously saw. This suggests we might be missing something.<\/p>\n\n\n\n<p>Actually, looking at the native navigation bar (right-side of the screen by default), it seems JEB analyzed very few areas of the executable:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"530\" height=\"41\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_navigation_bar.png\" alt=\"\" class=\"wp-image-1124\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_navigation_bar.png 530w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_navigation_bar-300x23.png 300w\" sizes=\"auto, (max-width: 530px) 100vw, 530px\" \/><figcaption>Navigation Bar <br>(green is cursor&#8217;s location, grey represents area without any code or data)<\/figcaption><\/figure><\/div>\n\n\n\n<p>To understand what happened let&#8217;s first look at JEB&#8217;s notifications window (File &gt; Notifications):<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/1_notif_natives-1024x208.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"208\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/1_notif_natives-1024x208.png\" alt=\"\" class=\"wp-image-3873\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/1_notif_natives-1024x208.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/1_notif_natives-300x61.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/1_notif_natives-768x156.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/1_notif_natives.png 1126w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>Notifications Window<\/figcaption><\/figure>\n\n\n\n<p>An interesting notification concerns the &#8220;Initial native analysis styles&#8221;, which indicates that code gaps were processed in <em>PROLOGUES_ONLY <\/em>mode (also known as a &#8220;conservative&#8221; analysis). As its name implies, code gaps are then disassembled only if they match a known routine prologue pattern (for the identified compiler and architecture). <\/p>\n\n\n\n<p><strong>This likely explains why most&nbsp;of&nbsp;the&nbsp;executable&nbsp;was&nbsp;not&nbsp;analyzed<\/strong>: the control-flow could not be safely followed and unreferenced code does not start with common prologue patterns.<\/p>\n\n\n\n<p class=\"has-text-align-left has-light-gray-background-color has-background\"><strong>Why did JEB used conservative analysis by default? <\/strong>JEB usually employs aggressive analysis on standard Linux executables, and disassembles (almost) <em>anything<\/em> within code areas (also known as &#8220;linear sweep disassembly&#8221;). In this case, JEB went conservative because the ELF file looks non-standard (eg, its sections were stripped).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Explore The Code (At Assembly Level)<\/h3>\n\n\n\n<p>Let&#8217;s take a look at the actual <code>main()<\/code> (<a href=\"http:\/\/refspecs.linuxbase.org\/LSB_3.1.0\/LSB-generic\/LSB-generic\/baselib---libc-start-main-.html\">first argument of  <code>__libc_start_main()<\/code><\/a>):<\/p>\n\n\n\n<figure class=\"wp-block-gallery columns-3 is-cropped wp-block-gallery-2 is-layout-flex wp-block-gallery-is-layout-flex\"><ul class=\"blocks-gallery-grid\"><li class=\"blocks-gallery-item\"><figure><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_1-2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"373\" height=\"801\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_1-2.png\" alt=\"\" data-id=\"1312\" data-link=\"https:\/\/www.pnfsoftware.com\/blog\/?attachment_id=1312\" class=\"wp-image-1312\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_1-2.png 373w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_1-2-140x300.png 140w\" sizes=\"auto, (max-width: 373px) 100vw, 373px\" \/><\/a><figcaption class=\"blocks-gallery-item__caption\">main() code <br>(part 1)<\/figcaption><\/figure><\/li><li class=\"blocks-gallery-item\"><figure><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_2-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"375\" height=\"811\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_2-1.png\" alt=\"\" data-id=\"1150\" data-link=\"https:\/\/www.pnfsoftware.com\/blog\/?attachment_id=1150\" class=\"wp-image-1150\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_2-1.png 375w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_2-1-139x300.png 139w\" sizes=\"auto, (max-width: 375px) 100vw, 375px\" \/><\/a><figcaption class=\"blocks-gallery-item__caption\">main() code<br>(part 2)<br><\/figcaption><\/figure><\/li><li class=\"blocks-gallery-item\"><figure><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_3-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"375\" height=\"807\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_3-1.png\" alt=\"\" data-id=\"1151\" data-link=\"https:\/\/www.pnfsoftware.com\/blog\/?attachment_id=1151\" class=\"wp-image-1151\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_3-1.png 375w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_3-1-139x300.png 139w\" sizes=\"auto, (max-width: 375px) 100vw, 375px\" \/><\/a><figcaption class=\"blocks-gallery-item__caption\">main() code<br>(part 3)<\/figcaption><\/figure><\/li><\/ul><\/figure>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_4-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"377\" height=\"272\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_4-1.png\" alt=\"\" class=\"wp-image-1153\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_4-1.png 377w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/recon_first_handler_4-1-300x216.png 300w\" sizes=\"auto, (max-width: 377px) 100vw, 377px\" \/><\/a><figcaption>main() code<br>(part 4)<\/figcaption><\/figure><\/div>\n\n\n\n<p>Ok&#8230; that&#8217;s where the fun begins!<\/p>\n\n\n\n<p>So, first a few <code>memcpy()<\/code> to copy <strong>large<\/strong> memory areas onto the stack, followed by series of &#8220;obfuscated&#8221; computations on these data.  The <code>main()<\/code>&nbsp;routine eventually returns on an address computed in&nbsp;<code>rax<\/code>&nbsp;register. In the end, JEB disassembler was not able to get this value, hence it stopped analyzing there. <\/p>\n\n\n\n<p>Let&#8217;s open the binary in JEB debugger, and retrieve the final <code>rax<\/code> value at runtime: <code>0x402335<\/code>. We ask JEB to create a routine at this address (&#8220;Create Procedure&#8221;, P), and end up on very similar code. <strong>After manually following the control-flow, we end up on very large routines &#8212; around 8k bytes &#8211;, with complex control-flow, built on similar obfuscated patterns. <\/strong><\/p>\n\n\n\n<p>And yet at this point we have only seen a <em>fraction<\/em> of this 10MB executable&#8230; We might naively estimate that there is more than 1000 routines like these, if the whole binary is built this way (10MB\/8KB = 1250)!<\/p>\n\n\n\n<p class=\"has-light-gray-background-color has-background\">Most obfuscated routines re-use the same stack frame (initialized in <code>main()<\/code> with the series of <code>memcpy()<\/code>). In others words, it looks like a very large function has been divided into chunks, connected through each other by obfuscated control flow computations.<\/p>\n\n\n\n<p><strong>At this point, it seems pretty clear that a first objective would be to properly retrieve all native routines. <\/strong>Arguably the most robust and elegant way to do that would be to follow the control flow, starting from the entry point routine . But how to follow through all these obfuscated computations?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Explore The Code (At C Level)<\/h2>\n\n\n\n<p>Let&#8217;s now take a look at the pseudo-C code produced by JEB for those first routines. For example, here is <code>main()<\/code>:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/2_main_decompiled.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1822\" height=\"808\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/2_main_decompiled.png\" alt=\"\" class=\"wp-image-3880\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/2_main_decompiled.png 1822w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/2_main_decompiled-300x133.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/2_main_decompiled-1024x454.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/2_main_decompiled-768x341.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/2_main_decompiled-1536x681.png 1536w\" sizes=\"auto, (max-width: 1822px) 100vw, 1822px\" \/><\/a><figcaption>Decompiled main()<\/figcaption><\/figure>\n\n\n\n<p>Overall, around 40 lines of C code, most of them being simple assignments, and a few others being complex operations. <strong>In comparison to the 200 non-trivial assembly instructions previously <strong>shown<\/strong>, that&#8217;s pretty encouraging.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What Do We Know<\/h2>\n\n\n\n<p>Let&#8217;s sum up what we noticed so far: MarsAnalytica&#8217;s executable is divided into (pretty large) handler routines, each of them passing control to the next one by computing its address. For that purpose, each handler reads values from a large stack, make a series of non-trivial computations on them, then write back new values into the stack.<\/p>\n\n\n\n<p class=\"has-light-gray-background-color has-background\"><a href=\"https:\/\/twitter.com\/0xTowel\/status\/999435131281723392\">As originally mentioned by 0xTowel<\/a>, the crackme author, it looks like a<a href=\"http:\/\/static.usenix.org\/event\/woot09\/tech\/full_papers\/rolles.pdf\"> virtual-machine style obfuscation<\/a>, where bytecodes are read from memory, and are interpreted to guide the execution. It should be noted that virtual machine handlers are never re-executed: execution seems to go from lower to higher addresses, with new handlers being discovered and executed.<\/p>\n\n\n\n<p>Also, let&#8217;s notice that while the executable is strongly obfuscated, there are some &#8220;good news&#8221;:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li> <strong>There does not seem to be any self-modifying code<\/strong>, meaning that all the code is<em> statically visible<\/em>, we &#8220;just&#8221; have to compute the control-flow to find it.<\/li><li><strong>JEB decompiled C code looks (pretty) simple<\/strong>, most C statements are simple assignments, except for some lengthy expression always based on the same operations; the decompilation pipeline simplified away parts of the complexity of the various assembly code patterns.<\/li><li><strong>There are very few subroutines called (we will come back on those later)<\/strong>, and also a few system APIs calls, so most of the logic is contained within the chain of obfuscated handlers.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">What Can We Do<\/h2>\n\n\n\n<p>Given all we know,  we could try to trace MarsAnalytica execution by <strong>implementing a C emulator<\/strong> working on JEB decompiled code. The emulator would <em>simulate<\/em> the execution of each handler routine, update a memory state, and retrieve the address of the next handler.<\/p>\n\n\n\n<p>The emulator would then produce an execution trace, and provide us access to the exact memory state at each step. Hence, we should find at some point <em>where<\/em> the user&#8217;s input is processed (typically, a call to <code>getchar()<\/code>), and then hopefully be able to follow how this input gets processed.<\/p>\n\n\n\n<p><strong>The main advantage of this approach is that we are going to work on (small) C routines, rather than large and complex assembly routines.<\/strong> <\/p>\n\n\n\n<p class=\"has-light-gray-background-color has-background\">There are a few additional reasons we decided to go down that road: <br><br> &#8211; <strong>The C emulator would be architecture-independent<\/strong> &#8212; several native architectures are decompiled to C by JEB &#8211;, allowing us to re-use it in situations where we cannot easily execute the target (e.g. MIPS\/ARM).<br><br> &#8211; It will be <strong>an interesting use-case for JEB public API to manipulate C code<\/strong>. Users could then extend the emulator to suit their needs.<br><br> &#8211; <strong>This approach can only work if the decompilation is correct, i.e. if the C code remains faithful to the original native code.<\/strong> In other words, it allows to &#8220;test&#8221; JEB decompilation pipeline&#8217;s correctness, which is &#8212; as a JEB&#8217;s developer &#8212; always interesting!<\/p>\n\n\n\n<p><strong>Nevertheless, a major drawback of emulating C code on this particular executable, is that we need the C code in the first place!<\/strong> Decompiling 10MB of obfuscated code is going to take a while; therefore this &#8220;plan&#8221; is certainly not the best one for time-limited Capture-The-Flag competitions.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Part 2: Building a (Simple) C Emulator<\/h1>\n\n\n\n<p>The emulator comes as a JEB back-end plugin, whose code can be found <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\">on our GitHub page<\/a>. It starts in <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/7d17a44ecf6f58b4e777452aef54178f7934f470\/src\/com\/pnf\/plugin\/cemulator\/CEmulatorPlugin.java#L111\">CEmulatorPlugin.java<\/a>, whose logic can be roughly summarized as the following pseudo-code:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\nemulatorState = initEmulatorState();\nwhile(true) {\n  handlerRoutine = analyze(handlerAddress) \/\/ disassemble and decompile      \n  emulatorState = emulator.emulate(handlerRoutine, emulatorState);\n\n  handlerAddress = emulatorState.getNextHandlerAddress();\n  if(handlerAddress.isUnknown()){\n    break;\n  }\n}\n<\/pre><\/div>\n\n\n<p><strong>In this part we will focus on <code><a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/7d17a44ecf6f58b4e777452aef54178f7934f470\/src\/com\/pnf\/plugin\/cemulator\/SimpleCEmulator.java#L85\">emulate()<\/a><\/code><\/strong> <strong>method<\/strong>. This method\u2019s purpose is to simulate the execution of a given C routine from a given machine state, and to provide in return the final machine state at the end of the routine.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Decompiled C Code<\/h2>\n\n\n\n<p>First thing first, let&#8217;s explore what JEB decompiled code looks like, as it will be <code>emulate()<\/code> input. <strong>JEB decompiled C code is stored in a <\/strong><em><strong>tree-structured representation<\/strong><\/em><strong>, akin to an <\/strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Abstract_syntax_tree\"><strong>Abstract Syntax Tree<\/strong><\/a><strong> (AST)<\/strong>.<\/p>\n\n\n\n<p>For example, let&#8217;s take the following C function:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>int myfunction()\n{\n    int a = 1;\n    while(a &lt; 3) {\n        a = a + 1;\n    }\n    return a;\n}<\/code><\/pre>\n\n\n\n<p>The JEB representation of <code>myfunction<\/code> body would then be:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/04\/ast_exemple.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"490\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/04\/ast_exemple-1024x490.png\" alt=\"\" class=\"wp-image-2568\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/04\/ast_exemple-1024x490.png 1024w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/04\/ast_exemple-300x143.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/04\/ast_exemple-768x367.png 768w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/04\/ast_exemple.png 1125w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption> AST Representation<br>(rectangles are JEB interfaces, circles are values) <\/figcaption><\/figure>\n\n\n\n<p>As of JEB 4.0, the hierarchy of interfaces representing AST elements (i.e. nodes in the graph) is the following:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2021\/02\/3_ICElement_hierar_NEW-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"286\" height=\"767\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2021\/02\/3_ICElement_hierar_NEW-1.png\" alt=\"\" class=\"wp-image-4088\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2021\/02\/3_ICElement_hierar_NEW-1.png 286w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2021\/02\/3_ICElement_hierar_NEW-1-112x300.png 112w\" sizes=\"auto, (max-width: 286px) 100vw, 286px\" \/><\/a><figcaption>AST ICElement Hierarchy<\/figcaption><\/figure>\n\n\n\n<p>Two parts of this hierarchy are of particular interest to us, in the context of building an emulator:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ast\/ICExpression.html\"><strong>ICExpression<\/strong><\/a><strong> represents C expressions<\/strong>, for example <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ast\/ICIdentifier.html\">ICIdentifier<\/a> (a variable), or <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ast\/ICOperation.html\">ICOperation<\/a> (any operation). Our emulator is going to evaluate those expressions, i.e. assign concrete values to them.<\/li><li><a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ast\/ICStatement.html\"><strong>ICStatement<\/strong><\/a><strong> represents C statements<\/strong>, including notably loops (<a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ast\/ICForLoopStm.html\">ICForLoopStm<\/a>, <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ast\/ICWhileLoopStm.html\">ICWhileLoopStm<\/a>), and if statements (<a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ast\/ICIfStm.html\">ICIfStm<\/a>). Our emulator is going to execute those statements.<\/li><\/ul>\n\n\n\n<p>Now, a method&#8217;s AST can be retrieved with JEB API by using <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/INativeDecompilerUnit.html#decompile(java.lang.String)\">INativeDecompilerUnit.decompile()<\/a> (see <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/7d17a44ecf6f58b4e777452aef54178f7934f470\/src\/com\/pnf\/plugin\/cemulator\/CEmulatorPlugin.java#L161\">CEmulatorPlugin.disassembleAndDecompile()<\/a> for how to disassemble and decompile a not-yet-existing routine).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Where Is The Control Flow?<\/h2>\n\n\n\n<p><strong>While an AST provides a precise representation of C elements, it does not provide explicitly the control flow.<\/strong> That is, the order of execution of statements is not normally provided by an AST, which rather shows how some elements contain others from a syntactic point-of-view. <\/p>\n\n\n\n<p>In order to simulate a C function execution, we are going to need the control flow. <strong>So here is our first step: compute the control flow of a C method and make it usable by our emulator.<\/strong><\/p>\n\n\n\n<p>To do so, we implemented a <em>very<\/em> simple Control-Flow Graph (CFG), which is computed from an AST. The code can be found in <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/main\/src\/com\/pnf\/plugin\/cemulator\/CFG.java\">CFG.java<\/a>, please refer to the documentation for the known limitations.<\/p>\n\n\n\n<p>Here is for example the CFG for the routine previously presented <code>myfunction()<\/code>:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"321\" height=\"217\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/ast_to_cfg_cfg-1.png\" alt=\"\" class=\"wp-image-1468\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/ast_to_cfg_cfg-1.png 321w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/ast_to_cfg_cfg-1-300x203.png 300w\" sizes=\"auto, (max-width: 321px) 100vw, 321px\" \/><figcaption>myfunction() CFG<\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"has-light-gray-background-color has-background\"><strong>Why does JEB does not provide a CFG for decompiled C code?<\/strong> Mainly because <em>at this point<\/em> JEB decompiler does not need it. Most important optimizations are done on JEB Intermediate Representation &#8212; for which <a href=\"https:\/\/www.pnfsoftware.com\/blog\/jeb-native-pipeline-ir-optimizers-part-2\/\">there is indeed a CFG<\/a>. On the other hand, C optimizations are mainly about &#8220;beautifying&#8221; the code (i.e. pure syntactic transformations), which can be done on the AST only <sup class='footnote'><a href='#fn-3867-3' id='fnref-3867-3' onclick='return fdfootnote_show(3867)'>3<\/a><\/sup>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Emulator Implementation<\/h2>\n\n\n\n<p>The main logic of the emulator can be found in <code><a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/7d17a44ecf6f58b4e777452aef54178f7934f470\/src\/com\/pnf\/plugin\/cemulator\/SimpleCEmulator.java#L85\">emulate(ICMethod method, EmulatorState inputState)<\/a><\/code>, which emulates a whole C method from a given input state:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: java; title: ; notranslate\" title=\"\">\nCFG cfg = CFG.buildCFG(method);\nICStatement currentStatement = cfg.getEntryPoint();\n\nwhile(currentStatement != null) {\n   currentStatement = emulateStatement(cfg, currentStatement);\n}\n<\/pre><\/div>\n\n\n<p>Before digging into the emulation logic, let&#8217;s see how <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/main\/src\/com\/pnf\/plugin\/cemulator\/EmulatorState.java\">emulator state<\/a> is represented and initialized.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Emulator State<\/h3>\n\n\n\n<p><strong>The emulator state is a representation of the machine&#8217;s state during emulation<\/strong>; it mainly comprehends the state of the memory and of the CPU registers.<\/p>\n\n\n\n<p><strong>The memory state<\/strong> is a <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/memory\/IVirtualMemory.html\">IVirtualMemory<\/a> object &#8212; JEB interface to represent virtual memory state. This memory state is created with MarsAnalytica executable initial memory space (set by JEB loader), and we allocate a large area at an arbitrary address to use as the stack during emulation:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: java; title: ; notranslate\" title=\"\">\n\/\/ initialize from executable memory\nmemory = nativeUnit.getMemory();\n\n\/\/ allocate large stack from BASE_STACK_POINTER_DEFAULT_VALUE (grows downward)\nVirtualMemoryUtil.allocateFillGaps(memory, BASE_STACK_POINTER_DEFAULT_VALUE - 0x10_0000, 0x11_0000, IVirtualMemory.ACCESS_RW);\n<\/pre><\/div>\n\n\n<p><strong>The CPU registers state<\/strong> is simply a Map from register IDs &#8212; JEB specific values to identify native registers &#8212; to values:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: java; title: ; notranslate\" title=\"\">\nMap&lt;Integer, Long&gt; registers = new HashMap&lt;&gt;();\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">Emulator Logic<\/h3>\n\n\n\n<p> The emulator processes each <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ast\/ICStatement.html\">ICStatement<\/a> in two steps (see <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/7d17a44ecf6f58b4e777452aef54178f7934f470\/src\/com\/pnf\/plugin\/cemulator\/SimpleCEmulator.java#L138\"><code>emulateStatement()<\/code><\/a>):<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><strong>Update the state according to the statement semantic<\/strong>, i.e. propagate all side-effects of the statement to the emulator state. <\/li><li><strong>Determine which statement should be executed next<\/strong>; this might involve evaluating some predicates.<\/li><\/ol>\n\n\n\n<p>For example, let&#8217;s examine the logic to emulate a simple assignment like <code>a = b + 0x17<\/code><sup class='footnote'><a href='#fn-3867-4' id='fnref-3867-4' onclick='return fdfootnote_show(3867)'>4<\/a><\/sup>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: java; title: ; notranslate\" title=\"\">\nvoid evaluateAssignment(ICAssignment assign) {\n  \/\/ evaluate right-hand side\n  Long rightValue = evaluateExpression(assign.getRight());\n\n  \/\/ assign to left-hand side\n  state.setValue(assign.getLeft(), rightValue);\n}\n<\/pre><\/div>\n\n\n<p>The method <code><a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/7d17a44ecf6f58b4e777452aef54178f7934f470\/src\/com\/pnf\/plugin\/cemulator\/SimpleCEmulator.java#L278\">evaluateExpression()<\/a><\/code> is in charge of getting a concrete value for a C expression (i.e. anything under <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ast\/ICExpression.html\">ICExpression<\/a>), which involves recursively processing all the subexpressions of this expression. <\/p>\n\n\n\n<p>In our example, the right-hand side expression to evaluate is an <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ast\/ICOperation.html\">ICOperation<\/a> (<code>b + 0x17<\/code>). Here is the extract of the code in charge of evaluating such operations:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: java; title: ; notranslate\" title=\"\">\nLong evaluateOperation(ICOperation operation) {\n        ICExpression opnd1 = operation.getFirstOperand();\n        ICExpression opnd2 = operation.getSecondOperand();\n        ICOperator operator = operation.getOperator();\n\n        switch(operator.getType()) {\n        case ADD:\n            return evaluateExpression(opnd1) + evaluateExpression(opnd2);\n\n&#x5B;...REDACTED...]\n<\/pre><\/div>\n\n\n<p>Therefore, we simply compute a concrete result using the corresponding Java operators for each <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ast\/ICOperator.html\">ICOperator<\/a>, and recursively evaluate the operands.<\/p>\n\n\n\n<p>Now, evaluating variable <code>b<\/code> means either reading memory or a register, depending on where <code>b<\/code> is mapped:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: java; title: ; notranslate\" title=\"\">\nLong getVarValue(ICIdentifier id) {\n  \/\/ read memory for local\/global variables...\n  if(id.getIdentifierClass() == CIdentifierClass.LOCAL || id.getIdentifierClass() == CIdentifierClass.GLOBAL) {\n      return readMemory(getVarAddress(id), getTypeSize(id.getType()));\n  }\n  \/\/ ...otherwise read CPU register\n  else {\n      return registers.get(id.getId());\n  }\n}\n<\/pre><\/div>\n\n\n<p class=\"has-light-gray-background-color has-background\">If <code>b<\/code> is a local variable, i.e. mapped in stack memory, the method <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ast\/ICIdentifier.html#getAddress()\"><code>ICIdentifier.getAddress()<\/code><\/a> provides us its <em>offset from the stack base address<\/em>. Also note that an <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ast\/ICIdentifier.html\">ICIdentifier<\/a> has an associated <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ast\/ICType.html\">ICType<\/a>, which provides us the variable&#8217;s size (through the type manager, see emulator&#8217;s <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/7d17a44ecf6f58b4e777452aef54178f7934f470\/src\/com\/pnf\/plugin\/cemulator\/EmulatorState.java#L181\"><code>getTypeSize()<\/code><\/a>).<\/p>\n\n\n\n<p>Finally, evaluating constant <code>0x17<\/code> in the operation <code>b + 0x17<\/code> simply means returning its raw value:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: java; title: ; notranslate\" title=\"\">\nif(expr instanceof ICConstantInteger) {\n    return ((ICConstantInteger&lt;?&gt;)expr).getValueAsLong();\n}\n<\/pre><\/div>\n\n\n<p>For statements with more complex control flow than an assignment, the emulator has to select the correct next statement from the CFG. For example, here is the emulation of a while loop <code>wStm<\/code> (<a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/code\/asm\/decompiler\/ast\/ICWhileStm.html\">ICWhileStm<\/a>):<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: java; title: ; notranslate\" title=\"\">\n\/\/ if predicate is true, next statement is while loop body...\nif(evaluateExpression(wStm.getPredicate()) != 0) {\n   return cfg.getNextTrueStatement(wStm);\n}\n\/\/ ...otherwise next statement is the one following while(){..}\nelse {\n   return cfg.getNextStatement(wStm);\n}\n<\/pre><\/div>\n\n\n<p>Refer to the <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/7d17a44ecf6f58b4e777452aef54178f7934f470\/src\/com\/pnf\/plugin\/cemulator\/SimpleCEmulator.java#L138\">complete implementation<\/a> for more glory details.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Emulating System APIs<\/h3>\n\n\n\n<p>In MarsAnalytica there are only a few system APIs that get called during the execution. Among those APIs, only <code>memcpy()<\/code> is actually needed for our emulation, as it serves to initialize the stack (remember <code>main()<\/code>). Here is the API emulation logic:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: java; title: ; notranslate\" title=\"\">\nLong simulateWellKnownMethods(ICMethod calledMethod,\n            List&lt;ICExpression&gt; parameters) {\n\n        if(calledMethod.getName().equals(&quot;\u2192time&quot;)) {\n            return 42L; \/\/ value does not matter\n        }\n        else if(calledMethod.getName().equals(&quot;\u2192srand&quot;)) {\n            return 37L; \/\/ value does not matter\n        }\n        else if(calledMethod.getName().equals(&quot;\u2192memcpy&quot;)) {\n            ICExpression dst = parameters.get(0);\n            ICExpression src = parameters.get(1);\n            ICExpression n = parameters.get(2);\n            \/\/ evaluate parameters concrete values\n            &#x5B;...REDACTED...]\n            state.copyMemory(src_, dst_, n_);\n            return dst_;\n          }\n       }\n}\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">Demo Time<\/h3>\n\n\n\n<p>The final implementation of our tracer can be found in <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\">our GitHub page<\/a>. Once executed, the plugin logs in JEB&#8217;s console an execution trace of the emulated methods, each of them providing the address of the next one:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&gt; emulating method sub_400DA9...\n  &gt;&gt; done; next method entry point: 0x00402335\n&gt; emulating method sub_402335...\n  &gt;&gt; done; next method entry point: 0x00402335\n&gt; emulating method sub_402335...\n  &gt;&gt; done; next method entry point: 0x00401b8f\n&gt; emulating method sub_401B8F...\n  &gt;&gt; done; next method entry point: 0x004018cd\n&gt; emulating method sub_4018CD...\n  &gt;&gt; done; next method entry point: 0x00401f62\n&gt; emulating method sub_401F62...\n  &gt;&gt; done; next method entry point: 0x00402335\n&gt; emulating method sub_402335...\n  &gt;&gt; done; next method entry point: 0x00403477\n&gt; emulating method sub_403477...\n  &gt;&gt; done; next method entry point: 0x00401502\n&gt; emulating method sub_401502...\n  &gt;&gt; done; next method entry point: 0x004018cd\n\n&#91;...REDACTED...]\n<\/code><\/pre>\n\n\n\n<p>Good news everyone: the handlers addresses are correct (we double-checked them with a debugger). In other words, <strong>JEB decompilation is correct and our emulator remains faithful to the executable logic<\/strong>. Phew&#8230;!<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Part 3: Solving The Challenge<\/h1>\n\n\n\n<h3 class=\"wp-block-heading\">Plot Twist: It Does Not Work<\/h3>\n\n\n\n<p>The first goal of the emulator was to find <em>where<\/em> user&#8217;s input is manipulated. We are looking in particular for a call to <code>getchar()<\/code>. So we let the emulator run for a <em>long<\/em> time, and&#8230;<\/p>\n\n\n\n<p>&#8230;it never reached a call to <code>getchar()<\/code>. <\/p>\n\n\n\n<p>The emulator was correctly passing through the obfuscated handlers (we regularly double-checked their addresses with a debugger), but <em>after a few days<\/em> the executed code was still printing MarsAnalytica magnificent ASCII art prompt (reproduced below). <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"641\" height=\"193\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/image.png\" alt=\"\" class=\"wp-image-1064\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/image.png 641w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/02\/image-300x90.png 300w\" sizes=\"auto, (max-width: 641px) 100vw, 641px\" \/><figcaption>MarsAnalytica Prompt <br><\/figcaption><\/figure>\n\n\n\n<p>After investigating, it appears that characters are printed one by one with <code>putchar()<\/code>, and each of these calls is in the middle of one heavily obfuscated handler, which will be executed once only. More precisely, <strong>after executing more than one third of the whole 10MB, the program is still not done with printing the prompt!<\/strong><\/p>\n\n\n\n<p>As mentioned previously, the &#8220;problem&#8221; with emulating decompiled C code is that we need the decompiled code in the first place, and decompiling lots of obfuscated routines takes time&#8230;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Let&#8217;s Cheat<\/h3>\n\n\n\n<p>Ok, we cannot reach <em>in a decent time<\/em> the point where the user&#8217;s input is processed by the program. But the execution until this point should be deterministic. <strong>What if&#8230; we start the emulation at the point where <code>getchar()<\/code> is called, rather than from the entry-point?<\/strong><\/p>\n\n\n\n<p>In other words, we are going to assume that we &#8220;found&#8221; the place where user&#8217;s input starts to be processed, and use the emulator to analyze <em>how<\/em> this input is processed.<\/p>\n\n\n\n<p>To do so, we used GDB debugger to set a breakpoint on <code>getchar()<\/code> and dumped both stack and heap memories at this point <sup class='footnote'><a href='#fn-3867-5' id='fnref-3867-5' onclick='return fdfootnote_show(3867)'>5<\/a><\/sup>.  Then, we extended the emulator to be able to <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/7d17a44ecf6f58b4e777452aef54178f7934f470\/src\/com\/pnf\/plugin\/cemulator\/EmulatorState.java#L73\">initialize its memory state from stack\/heap memory dumps<\/a>, and change emulation start address to be the first call to <code>getchar()<\/code>. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What Now?<\/h2>\n\n\n\n<p>At this point <code>getchar()<\/code> is called to get the first input character, so we let the emulator simulate this API by<a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/7d17a44ecf6f58b4e777452aef54178f7934f470\/src\/com\/pnf\/plugin\/cemulator\/MarsAnalyticaCEmulator.java#L84\"> returning a pseudo-randomly chosen character<\/a>, such that we can follow the rest of the execution. <strong>After 19 calls to <\/strong><code><strong>getchar()<\/strong><\/code><strong> we finally enter the place where user&#8217;s input is processed.<\/strong> Hooray&#8230;<\/p>\n\n\n\n<p>Then, we let the emulator run for a whole day, which provided the execution trace we will be working on for the rest of this blog. After digging into the trace we noticed that <strong>input characters were passed as arguments to a few special routines<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Introducing The Stack Machine<\/h3>\n\n\n\n<p>When we first skimmed through MarsAnalytica code, we noticed a few routines that seemed specials for two reasons:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>While obfuscated routines are executed only once and in a linear fashion (i.e. from low to high memory addresses), these &#8220;special&#8221; routines are at the very beginning of the executable and are called very often during the execution.<\/li><li>These routines&#8217; code is not obfuscated and seems to be related with memory management at first sight.<\/li><\/ul>\n\n\n\n<p>For example, here is JEB decompiled code for the first of them (comments are ours):<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\nlong sub_400AAE(unsigned long* param0, int param1) {\n    long result;\n    unsigned long* ptr0 = param0;\n    int v0 = param1;\n\n    if(!ptr0) {\n        result = 0xffffffffL;\n    }\n    else {\n        \/\/ allocate new slot\n        void* ptr1 = \u2192malloc(16L);\n        if(!ptr1) {\n            \/*NO_RETURN*\/ \u2192exit(0);\n        }\n\n        \/\/ set value in new slot\n        *(int*)((long)ptr1 + 8L) = v0;\n\n        \/\/ insert new slot in first position\n        *(long*)ptr1 = *ptr0;\n        *ptr0 = ptr1;\n        result = 0L;\n    }\n\n    return result;\n}\n<\/pre><\/div>\n\n\n<p>What we have here is basically a &#8220;push&#8221; operation for a stack implemented as a chained list (<code>param0<\/code> is a pointer to the top of the stack, <code>param1<\/code> the value to be pushed). <\/p>\n\n\n\n<p>Each slot of the stack is 16 bytes, with the first 8 bytes being a pointer to the next slot and the next 4 bytes containing the value (remaining 4 bytes are not used).<\/p>\n\n\n\n<p><strong>It now seemed clear that these special routines are the crux of the challenge.<\/strong> So we reimplemented <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/7d17a44ecf6f58b4e777452aef54178f7934f470\/src\/com\/pnf\/plugin\/cemulator\/MarsAnalyticaCEmulator.java#L93\">most of them in the emulator<\/a>, mainly as a way to fully understand them. For example, here is our &#8220;push&#8221; implementation:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\n\/** PUSH(STACK_PTR, VALUE) *\/\nif(calledMethod.getName().equals(&quot;sub_400AAE&quot;)) {\n    Long pStackPtr = evaluateExpression(parameters.get(0));\n    Long pValue = evaluateExpression(parameters.get(1));\n\n    long newChunkAddr = allocateNewChunk();\n\n    \/\/ write value\n    state.writeMemory(newChunkAddr + 8, pValue, 4);\n\n    \/\/ link new chunk to existing stack\n    Long stackAdr = state.readMemory(pStackPtr, 8);\n    state.writeMemory(newChunkAddr, stackAdr, 8);\n\n    \/\/ make new chunk the new stack head\n    state.writeMemory(pStackPtr, newChunkAddr, 8);\n\n}\n<\/pre><\/div>\n\n\n<p>Overall, these operations are implementing a custom data-structure that can be operated in a last-in, first-out fashion, but also with direct accesses through indexes. Let&#8217;s call this data structure the &#8220;stack machine&#8221;. <\/p>\n\n\n\n<p>Here are the most used operators:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Address<\/strong><\/td><td><strong>Operator<\/strong><br><em>(names are ours)<\/em><\/td><td><strong>Argument(s)<\/strong><\/td><\/tr><tr><td><code>0x400AAE<\/code><\/td><td>PUSH<\/td><td>VALUE<\/td><\/tr><tr><td><code>0x4009D7<\/code><\/td><td>POP<\/td><td>VALUE<\/td><\/tr><tr><td><code>0x400D08<\/code><\/td><td>GET<\/td><td>INDEX<\/td><\/tr><tr><td><code>0x400D55<\/code><\/td><td>SET<\/td><td>INDEX,VALUE<\/td><\/tr><\/tbody><\/table><figcaption>Stack Machine&#8217;s Main Operators<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Tracing The Stack Machine<\/h3>\n\n\n\n<p>At this point, we modified the emulator to log <em>only<\/em> stack operations with their arguments, starting from the first call to <code>getchar()<\/code>. The full trace can be found <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/main\/data\/mars_analytica_stack_machine_trace.log\">here<\/a>, and here is an extract:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>S: SET index:7 value:97\nS: SET index:8 value:98\nS: SET index:13 value:99\nS: SET index:15 value:100\nS: SET index:16 value:101\n\n&#91;...REDACTED...]\n\nS: PUSH 2700\nS: POP (2700)\nS: SET index:32 value:2700\nS: GET index:32\nS: PUSH 2700\nS: PUSH 2\nS: POP (2)\nS: POP (2700)\nS: PUSH 2702\n\n&#91;...REDACTED...]<\/code><\/pre>\n\n\n\n<p>The trace starts with a long series of <code>SET<\/code> operations, which are storing the result of <code>getchar()<\/code> at specific indexes in the stack machine (<code>97<\/code>, <code>98<\/code>, <code>99<\/code>,&#8230; are characters provided by the emulator).<\/p>\n\n\n\n<p>And then, a long series of operations happen, combining the input characters with some constant values. Some interesting patterns appeared at this point, for example:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">S: POP <strong>(2)<\/strong>   \nS: POP <strong>(2700)<\/strong>   \nS: PUSH <strong>2702<\/strong>   <\/pre>\n\n\n\n<p>Here an addition was made between the two popped values, and the result was then pushed. Digging into the trace, it appears there are also handlers popping two values and pushing back a subtraction, multiplication, exclusive or, etc.  <\/p>\n\n\n\n<p>Another interesting pattern appears at several places:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">S: POP <strong>(16335)<\/strong>   \nS: POP <strong>(1234764)<\/strong>   \nS: PUSH <strong>1<\/strong>&nbsp;  <\/pre>\n\n\n\n<p>Looking at the corresponding C code, it is actually a comparison between the two popped values &#8212; &#8220;greater than&#8221; in this case &#8211;, and the boolean result (0 or 1) is then pushed. Once again, different comparison operators (equal, not equal, &#8230;) are used in different handlers.<\/p>\n\n\n\n<p>Finally, something suspicious also stood out in the trace:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">S: <strong>PUSH 137<\/strong>\nS: <strong>PUSH 99<\/strong>\nS: <strong>POP (137)<\/strong>\nS: <strong>POP (99)<\/strong><\/pre>\n\n\n\n<p>The popped values do not match the order in which they were pushed!<\/p>\n\n\n\n<p>Digging into the code we end up on a special routine (<code>0x402AB2<\/code>), which swaps the two top-most values&#8230; So to make things clearer, <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/7d17a44ecf6f58b4e777452aef54178f7934f470\/src\/com\/pnf\/plugin\/cemulator\/MarsAnalyticaCEmulator.java#L37\">the emulator logs in the execution trace a <code>SWAP<\/code> operator<\/a> whenever this routine gets executed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Where Is My Precious Operator?<\/h3>\n\n\n\n<p>Our objective here is to understand how input characters are manipulated, and what tests are done on them.  <strong>In other words,<\/strong> <strong>we want to know for each <\/strong><code><strong>POP\/POP\/PUSH<\/strong><\/code><strong> pattern if it is an operation (and <em>which<\/em> operation &#8212; addition, subtraction &#8230;&#8211;), or a test (and <em>which<\/em> test &#8212; equal, greater than &#8230;&#8211;). <\/strong><\/p>\n\n\n\n<p class=\"has-light-gray-background-color has-background\">Again, note that routines implementing <code>POP\/POP\/PUSH<\/code> patterns are executed <em>only once<\/em>. So we cannot individually analyze them and rely on their addresses.<\/p>\n\n\n\n<p><strong>This is where working on decompiled C code becomes particularly handy.<\/strong> For each <code>POP\/POP\/PUSH<\/code> series:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li> We search in the method&#8217;s decompiled code if a C operator was used on the <code>PUSH<\/code> operand. To do so, it is as simple as looking at the operand itself, thanks to JEB decompiler&#8217;s optimizations! For example, here is a subtraction:<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-preformatted\">...\n<code><code>long v1 = <strong>pop<\/strong>(v0 - 0x65f48L); <\/code><\/code>\n<code><code>long v2 = <strong>pop<\/strong>(v0 - 0x65f48L); <\/code><\/code>\n<code><code><strong>push<\/strong>(v0 - 0x65f48L, <strong>v1 - v2<\/strong>);<\/code><\/code>\n...\n<\/pre>\n\n\n\n<p>When a C operator is found in <code>push()<\/code> second operand, the emulator adds the info (with the number of operands) in the trace: <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">S: POP (137)\nS: POP (99)\nS: PUSH 38\n<strong>| operation: (-,#op=2)<\/strong><\/pre>\n\n\n\n<ul class=\"wp-block-list\"><li>Also, we check if there is a &#8220;if&#8221; statement following a <code>POP<\/code> in the C code. For example, here is a &#8220;greater-than&#8221; check between popped values:<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-preformatted\">...\n<code><code><code>long v2 = <strong>pop<\/strong>(v0 - 0x65f48L); <\/code><\/code><\/code>\n<code><code><code>long v3 = <strong>pop<\/strong>(v0 - 0x65f48L); <\/code><\/code><\/code>\n<code><code><code>if(<strong>v2 &gt; v3<\/strong>) {<\/code><\/code><\/code>\n...<\/pre>\n\n\n\n<p>If so, the emulator extracts the C operator used in the if statement and logs it in the trace (as a pseudo stack operator named <code>TEST<\/code>):<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">S: POP (16335)   \nS: POP (1234764)   \n<strong>S: TEST (&gt;,#op=2)<\/strong> \nS: PUSH 0   <\/pre>\n\n\n\n<p class=\"has-light-gray-background-color has-background\">It should be noted that operands are always ordered in the same way: first poped value is on left side of operators. So operators and operands are the only thing we need to reconstruct  the whole operation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Time To Go Symbolic<\/h2>\n\n\n\n<p>At this point, our execution trace shows <em>how<\/em> the user&#8217;s input is stored onto the stack, and which operations and tests are then done. Our emulator is providing a &#8220;bad&#8221; input, so they are certainly failed checks in our execution trace. <strong>Our goal is now to find these checks, and then the correct input characters.<\/strong><\/p>\n\n\n\n<p>At this point, it is time to introduce &#8220;symbolic&#8221; inputs, rather than using concrete values as we have in our trace. To do so, we made <strong>a quick and dirty <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/main\/data\/symbolic_exec_stack_machine.py\">Python script<\/a> to replay stack machine trace using symbolic variables rather than concrete values<\/strong>.<\/p>\n\n\n\n<p>First, we initialize a Python &#8220;stack&#8221; with symbols (the stack is a <code>list()<\/code>, and the symbols are strings representing each character &#8220;<code>c0<\/code>&#8220;, &#8220;<code>c1<\/code>&#8220;, &#8220;<code>c2<\/code>&#8220;&#8230;). We put those symbols at the same indexes used by the initial <code>SET<\/code> operations:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# fill stack with &#039;symbolic&#039; variables (ie, characters)\n# at the initial offset retrieved from the trace\nstack = &#x5B;None] * 50 # arbitrary size\ncharCounter = 0\nstack&#x5B;7] = &#039;c&#039; + str(charCounter) # S: SET index:7 value:c0\ncharCounter+=1\nstack&#x5B;8] = &#039;c&#039; + str(charCounter) # S: SET index:8 value:c1\n\n&#x5B;... REDACTED ...]\n<\/pre><\/div>\n\n\n<p>We also need a temporary storage for expressions that get popped from the stack. <\/p>\n\n\n\n<p>Then, we read the trace file and for each stack operation we execute the equivalent operation on our Python stack:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nif operator == &quot;SWAP&quot;:\n  last = stack.pop()\n  secondToLast = stack.pop()\n  stack.append(last)\n  stack.append(secondToLast)\n\nelif operator == &quot;GET&quot;:\n  index = readIndexFromLine(curLine)\n  temporaryStorage.append(stack&#x5B;int(index)])\n\nelif operator == &quot;SET&quot;:\n  index = readIndexFromLine(curLine)\n  stack&#x5B;int(index)] = temporaryStorage.pop()\n\nelif operator == &quot;POP&quot;:\n  value = stack.pop()\n  temporaryStorage.append(value)\n\n&#x5B;... REDACTED ...]\n<\/pre><\/div>\n\n\n<p><strong>Now here is the important part<\/strong>: whenever there is an operation, we build a new symbol by &#8220;joining&#8221; the symbol operands and the operator. Here is an example of an addition between symbols &#8220;<code>c5<\/code>&#8221; and &#8220;<code>c9<\/code>&#8220;, corresponding respectively to the concrete input characters initially stored at index 26 and 4:<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Concrete Trace<\/strong><\/td><td><strong>Symbolic Trace<\/strong><\/td><\/tr><tr><td><pre class=\"wp-block-preformatted\">...<br>GET index:26<br><br>PUSH 102<br><br>GET index:4<br><br>PUSH 106<br><br>POP (106)<br><br>POP (102)<br><br>PUSH 208<br>  | <strong>operation: (+,#op=2)<\/strong><br>...<\/pre><\/td><td><pre class=\"wp-block-preformatted\">...<br>GET index:26<br><br>PUSH \"c5\"<br><br>GET index:4<br><br>PUSH \"c9\"<br><br>POP (\"c9\")<br><br>POP (\"c5\")<br><br><strong>PUSH \"c9+c5\"<\/strong><br><br>...<\/pre><\/td><\/tr><\/tbody><\/table><figcaption>Concrete execution trace, and its corresponding symbolic trace; on the symbolic side, rather than pushing the actual result of 106 + 102, we build an addition between the two symbols corresponding to the two concrete values<\/figcaption><\/figure>\n\n\n\n<p class=\"has-light-gray-background-color has-background\">Note that our symbolic executor starts with a clean stack, containing only input symbols. All constants used during the computation are indeed coming from the bytecode (the large memory area copied on the (native) stack at the beginning of the execution), and not from the stack machine.<\/p>\n\n\n\n<p>We can then observe series of operations on input symbols getting build by successive <code>POP\/POP\/PUSH<\/code> patterns, and being finally checked against specific values. Here is an extract of our stack at the end:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n((((c12-c10)^c13)*(c6*c14))!=16335)\n(((c18^c1)^(c15-c7))!=83)\n((((c9+c5)^c0)*(c17-c16))!=4294961394)\n((c3-c11)!=11)\n(((c2+c4)^c8)!=3)\n((c8+(c15-c4))!=176)\n((((c9^c10)-(c11+c18))^c6)!=4294967097)\n(((c1*(c0^c17))+(c2*c16))!=9985)\n(((c14*c13)-c7)!=2083)\n(((c12+c3)-c5)!=110)\n(((c8*c10)+(c9+c13))!=5630)\n(((c5-c16)-(c2+c0))!=4294967114)\n((c17*(c14^c7))!=7200)\n(((c1*c3)+(c6*c11))!=17872)\n(((c12-c15)-(c18*c4))!=4294961888)\n(((c11*c2)+(c3*c15))!=18888)\n((c16*(c5+c13))!=15049)\n((c17*(c0+c10))!=12150)\n((c18*(c14^c6))!=10080)\n(((c7+c12)-c4)!=132)\n((c8+(c1*c9))!=2453)\n<\/pre><\/div>\n\n\n<p>It seems pretty clear that those checks are the ones we are looking for, except that we need to revert inequality tests into equality tests.<\/p>\n\n\n\n<p>Now, how to find the values of symbols &#8220;<code>c0<\/code>&#8220;, &#8220;<code>c1<\/code>&#8220;,.. passing these tests? <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Final<\/h2>\n\n\n\n<p>To find the correct input characters, we used <a href=\"https:\/\/rise4fun.com\/z3\/tutorial\">Z3 SMT solver<\/a> Python bindings, and let <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/main\/data\/solve.py\">the solver do its magic<\/a>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfrom z3 import *\n\n# initialize our characters as 8-bit bitvectors\nc0 = BitVec(&#039;c0&#039;, 8)\nc1 = BitVec(&#039;c1&#039;, 8)\nc2 = BitVec(&#039;c2&#039;, 8)\nc3 = BitVec(&#039;c3&#039;, 8)\nc4 = BitVec(&#039;c4&#039;, 8)\nc5 = BitVec(&#039;c5&#039;, 8)\nc6 = BitVec(&#039;c6&#039;, 8)\nc7 = BitVec(&#039;c7&#039;, 8)\nc8 = BitVec(&#039;c8&#039;, 8)\nc9 = BitVec(&#039;c9&#039;, 8)\nc10 = BitVec(&#039;c10&#039;, 8)\nc11 = BitVec(&#039;c11&#039;, 8)\nc12 = BitVec(&#039;c12&#039;, 8)\nc13 = BitVec(&#039;c13&#039;, 8)\nc14 = BitVec(&#039;c14&#039;, 8)\nc15 = BitVec(&#039;c15&#039;, 8)\nc16 = BitVec(&#039;c16&#039;, 8)\nc17 = BitVec(&#039;c17&#039;, 8)\nc18 = BitVec(&#039;c18&#039;, 8)\n\ns = Solver()\n\n# allowed character range\ns.add(c0 &gt; 32, c0 &lt; 127)\ns.add(c0 &gt; 32, c0 &lt; 127)\ns.add(c1 &gt; 32, c1 &lt; 127)\ns.add(c2 &gt; 32, c2 &lt; 127)\ns.add(c3 &gt; 32, c3 &lt; 127)\n&#x5B;... REDACTED ...]\n\n# checks\ns.add((((c12-c10)^c13)*(c6*c14))==16335)\ns.add(((c18^c1)^(c15-c7))==83)\ns.add((((c9+c5)^c0)*(c17-c16))==4294961394)\ns.add((c3-c11)==11)\ns.add(((c2+c4)^c8)==3)\ns.add((c8+(c15-c4))==176)\n&#x5B;... REDACTED ...]\n<\/pre><\/div>\n\n\n<p class=\"has-light-gray-background-color has-background\">Here is another advantage to work with C code: the expressions built from our emulator&#8217;s trace are using <em>high-level<\/em> operators, which are directly understood by Z3.<\/p>\n\n\n\n<p>Finally, we ask Z3 for a possible solution to the constraints, and we build the final string from <code>c0<\/code>, <code>c1<\/code>,&#8230; values:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nm = s.model()\nresult = &#039;&#039;\nresult += chr(m&#x5B;c0].as_long())\nresult += chr(m&#x5B;c1].as_long())\nresult += chr(m&#x5B;c2].as_long())\nresult += chr(m&#x5B;c3].as_long())\n...\n<\/pre><\/div>\n\n\n<p>And&#8230;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"644\" height=\"229\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/03\/victory-1.png\" alt=\"\" class=\"wp-image-2119\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/03\/victory-1.png 644w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2019\/03\/victory-1-300x107.png 300w\" sizes=\"auto, (max-width: 644px) 100vw, 644px\" \/><figcaption>Hurray!<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>We hope you enjoy this blog post, where we used JEB C decompiled code to analyze a heavily obfuscated executable. <\/p>\n\n\n\n<p>Please refer to our <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\">GitHub page<\/a> for emulator code. While it has been tailored for MarsAnalytica crackme, it can be extended to emulate any executable&#8217;s decompiled C code (MarsAnalytica&#8217;s specific emulation logic is constrained in subclass <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/main\/src\/com\/pnf\/plugin\/cemulator\/MarsAnalyticaCEmulator.java\">MarsAnalyticaCEmulator<\/a>). <\/p>\n\n\n\n<p>You can run the plugin directly from JEB UI (refer to <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/main\/README.md\">README<\/a>): <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"902\" height=\"263\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/image.png\" alt=\"\" class=\"wp-image-3975\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/image.png 902w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/image-300x87.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/image-768x224.png 768w\" sizes=\"auto, (max-width: 902px) 100vw, 902px\" \/><\/figure>\n\n\n\n<p>By default, it will show emulation traces as text subunits in JEB project (stack machine trace in MarsAnalytica mode, or just C statements trace):<br><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"799\" height=\"300\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/image-1.png\" alt=\"\" class=\"wp-image-3976\" srcset=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/image-1.png 799w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/image-1-300x113.png 300w, https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2020\/11\/image-1-768x288.png 768w\" sizes=\"auto, (max-width: 799px) 100vw, 799px\" \/><figcaption>Plugin output: left panel is MarsAnalytica stack machine trace (when MarsAnalytica specific emulation logic is enabled), while right panel shows C statements emulation trace<\/figcaption><\/figure>\n\n\n\n<p>Alternatively, the plugin comes with a <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/main\/src\/com\/pnf\/plugin\/cemulator\/HeadlessClient.java\">headless client<\/a>, more suitable to gather long running emulation traces.<\/p>\n\n\n\n<p>Finally, kudo to <a href=\"https:\/\/twitter.com\/0xTowel\">0xTowel<\/a> for the awesome challenge! You can also check the excellent <a href=\"https:\/\/re-dojo.github.io\/write-ups\/2018-10-28-nsec-2018-mars-analytica\/\">Scud&#8217;s solution<\/a>.<\/p>\n\n\n\n<p>Feel free to message us on&nbsp;<a href=\"https:\/\/jebdecompiler.slack.com\/\">Slack<\/a> if you have any questions. In particular, we would be super interested if you attempt to solve complex challenges like this one with JEB!<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p><\/p>\n\n\n<div class='footnotes' id='footnotes-3867'><div class='footnotedivider'><\/div><ol><li id='fn-3867-1'> While JEB&#8217;s default decompiled code follows (most of) C syntactic rules and their semantics, some custom operators might be inserted to represent low-level operations and ease the reading; hence strictly speaking JEB&#8217;s decompiled code should be called <em>pseudo<\/em>-C. The decompiled output can also be variants of C, e.g. the Ethereum decompiler produce pseudo-Solidity code. <span class='footnotereverse'><a href='#fnref-3867-1'>&#8617;<\/a><\/span><\/li><li id='fn-3867-2'> SHA1 of the UPX-packed executable: fea9d1b1eb9d3f93cea6749f4a07ffb635b5a0bc <span class='footnotereverse'><a href='#fnref-3867-2'>&#8617;<\/a><\/span><\/li><li id='fn-3867-3'> Implementing a complete CFG on decompiled C code will likely be done in future versions of JEB, in order to provide more complex C optimizations. <span class='footnotereverse'><a href='#fnref-3867-3'>&#8617;<\/a><\/span><\/li><li id='fn-3867-4'> The actual implementation is more complex than that, e.g. it has to deal with pointers dereferencement, refer to <a href=\"https:\/\/github.com\/pnfsoftware\/jeb-c-emulator-plugin\/blob\/7d17a44ecf6f58b4e777452aef54178f7934f470\/src\/com\/pnf\/plugin\/cemulator\/SimpleCEmulator.java#L138\"><code>emulateStatement()<\/code><\/a> for details. <span class='footnotereverse'><a href='#fnref-3867-4'>&#8617;<\/a><\/span><\/li><li id='fn-3867-5'> Dumping memory was done with <a href=\"https:\/\/github.com\/longld\/peda\">peda for GDB<\/a>, and commands <code>dumpmem stack.mem stack<\/code> and <code>dumpmem heap.mem heap<\/code> <span class='footnotereverse'><a href='#fnref-3867-5'>&#8617;<\/a><\/span><\/li><\/ol><\/div>","protected":false},"excerpt":{"rendered":"<p>Disclaimer: a long time ago in our galaxy, we published part 1 of this blog post; then we decided to wait for the next major release of JEB decompiler before publishing the rest. A year and a half later, JEB 4.0 is finally out! So it is time for us to publish our complete adventure &hellip; <a href=\"https:\/\/www.pnfsoftware.com\/blog\/traveling-around-mars-with-c-emulation\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Traveling Around Mars With C Emulation<\/span><\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,22,13,5],"tags":[],"class_list":["post-3867","post","type-post","status-publish","format-standard","hentry","category-decompilation","category-jeb4","category-native-code","category-obfuscation"],"_links":{"self":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/posts\/3867","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/comments?post=3867"}],"version-history":[{"count":0,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/posts\/3867\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/media?parent=3867"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/categories?post=3867"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/tags?post=3867"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}