All posts by Joan Calvet

Android NDK Libraries Signatures

In this blog post, we present a new batch of native signatures released with JEB3 to identify Android Native Development Kit (NDK) libraries.

First, let’s briefly give some context. The Android NDK is a set of tools allowing developers to embed compiled C/C++ code into their Android applications. Thus, developers can integrate existing native code libraries, develop performance-sensitive code in C/C++ or obfuscate algorithms with native code protectors.

In practice, native code within Android applications comes in the form of ELF shared libraries (“.so”); the native methods can then be called from Java using Java Native Interface (JNI), which we described in a previous blog post.

NDK Pre-Built Libraries

Android NDK provides some pre-built libraries that can be linked against. For example, there are several C++ Standard Template Library (STL) 1 , or the Zlib decompression library.

As an example, let’s compile a “hello world” Android NDK C++ library with NDK r17. By default, the C++ implementation will be gnustl — the default choice before NDK r18.

Here is the C++ code:

When compiled with Android Studio’s default settings, libraries are linked dynamically, and libgnustl_shared.so is directly included in the application — because it is not a system library –, for each supported Application Binary Interface (ABI).

Files hierarchy of the Android application containing our “hello world” native library

If we open the ARM library we can pretty easily understand the — already convoluted — logic of our “hello world” routine, thanks to the names of gnustl external API calls:

Control-flow graph of ARM “hello world” with gnustl dynamically linked. Note that JEB displays mangled names when API calls correspond to external routines.

Now, Android NDK also provides static versions for most of the pre-built libraries. A developer — especially a malware developer wishing to hinder analysis — might prefer to use those.

When compiled in static mode, gnustl library is now ‘included’ in our native library, and here is our “hello world” routine:

Control-flow graph of ARM “hello world” with gnustl statically linked. Subroutines bear no specific names.

In this case, the analysis will be slowed down by the numerous routine calls with no specific names; each of this subroutine will need to be looked at to understand the whole purpose.

This brings us to a common reverse-engineering problem: is there a way to automatically identify and rename static library code, such that the analyst can focus on the application code?

JEB3 NDK Signatures

That’s when JEB native signatures come to the rescue! Indeed JEB3 now provides signatures for the following Android NDK  static libraries:

  • gnustl
  • libc++
  • STLport
  • libc
  • libmath
  • zlib

We provide signatures for ARM/ARM64 ABIs (including all variants like arm-v7a, arm-v7a-hard, thumb or ARM mode, etc) of these libraries, from NDK r10 to NDK r18.

These signatures are built in a similar fashion to our x86/x64 Visual Studio native signatures, and are intended to be “false-positive free”, which means a match should be blindly trustable. Note that JEB users can create their own signatures directly from the UI.

So, within JEB, if we open our statically-linked library with the signatures loaded, gnustl library routines are identified and renamed:

Control-flow graph of ARM “hello world” with gnustl statically linked and NDK signatures loaded. Subroutines have been renamed.

Note: the attentive reader might have noticed some “unk_lib_subX” routines in the previous image. Those names correspond to cases where several library routines match the routine. The user can then see the conflicting names in the target routine and use the most suitable one.

Due to the continuous evolution of compilers and libraries, it is not an easy task to provide up-to-date and useful signatures, but we hope this first NDK release will help our users. Nevertheless, more libraries should certainly be signed in the future, and we encourage users to comment on that  (email, Twitter, Slack).

  1.  NDK C++ support is a turbulent story, to say the least. Historically, different implementations of C++ have been provided with the NDK (gnustl, STLport, libc++,…), each of them coming with a different set of features (exceptions handling, RTTI…). Since the very recent r18 version (released in september 2018) Android developers must now use only libc++.

JEB3 Auto-Signing Mode

In this video we introduce a novel JEB 3.0 feature: auto-signing mode for native code.

In a nutshell, when this mode is activated all modifications made by users to native code in JEB (renaming a routine, adding a comment, etc) are “signed”.

The newly created signatures can then be loaded against another executable, and all the information of the original analysis will be imported if the same code is recognized. Therefore, the user only needs to analyze each routine once.

Without further ado, here is the video, which begins by introducing native signatures before showcasing auto-signing:

As usual, feel free to reach out to us (email, Twitter, Slack) if you have questions or suggestions.

Having Fun with Obfuscated Mach-O Files

Last week was the release of JEB 2.3.7 with a brand new parser for Mach-O, the executable file format of Apple’s macOS and iOS operating systems. This file format, like its cousins PE and ELF, contains a lot of technical peculiarities and implementing a reliable parser is not a trivial task.

During the journey leading to this first Mach-O release, we encountered some interesting executables. This short blog post is about one of them, which uses some Mach-O features to make reverse-engineering harder.

Recon

The executable in question belongs to a well-known adware family dubbed InstallCore, which is usually bundled with others applications to display ads to the users.

The sample we will be using in this post is the following:

57e4ce2f2f1262f442effc118993058f541cf3fd: Mach-O 64-bit x86_64 executable

Let’s first take a look at the Mach-O sections:

Figure 1 – Mach-O sections

Interestingly, there are some sections related to the Objective-C language (“__objc_…”). Roughly summarized, Objective-C was the main programming language for OS X and iOS applications prior the introduction of Swift. It adds some object-oriented features around C, and it can be difficult to analyze at first, in particular because of its way to call methods by “sending messages”.

Nevertheless, the good news is that Objective-C binaries usually come with a lot of meta-data describing methods and classes, which are used by Objective-C runtime to implement the message passing. These metadata are stored in the “__objc_…” sections previously mentioned, and the JEB Mach-O parser process them to find and properly name Objective-C methods.

After the initial analysis, JEB leaves us at the entry point of the program (the yellow line below):

Figure 2 – Entry point

Wait a minute… there is no routine here and it is not even correct x86-64 machine code!

Most of the detected routines do not look good either; first, there are a few objective-C methods with random looking names like this one:

Figure 3 – Objective-C method

Again the code makes very little sense…

Then comes around 50 native routines, whose code can also clearly not be executed “as is”, for example:

Figure 4 – Native routine

Moreover, there are no cross-references on any of these routines! Why would JEB disassembler engine – which follows a recursive algorithm combined with heuristics – even think there are routines here?!

Time for a Deep Dive

Code Versus Data

First, let’s deal with the numerous unreferenced routines containing no correct machine code. After some digging, we found that they are declared in the LC_FUNCTION_STARTS Mach-O command – “command” being Mach-O word for an entry in the file header.

This command provides a table containing function entry-points in the executable. It allows for example debuggers to know function boundaries without symbols. At first, this may seem like a blessing for program analysis tools, because distinguishing code from data in a stripped executable is usually a hard problem, to say the least. And hence JEB, like other analysis tools, uses this command to enrich its analysis.

But this gift from Mach-O comes with a drawback: nothing prevents miscreants to declare function entry points where there are none, and analysis tools will end up analyzing random data as code.

In this binary, all routines declared in LC_FUNCTION_STARTS command are actually not executable. Knowing that, we can simply remove the command from the Mach-O header (i.e. nullified the entry), and ask JEB to re-analyze the file, to ease the reading of the disassembly. We end up with a much shorter routine list:

Figure 5 – Routine list

The remaining routines are mostly Objective-C methods declared in the metadata. Once again, nothing prevents developers to forge these metadata to declare method entry points in data. For now, let’s keep those methods here and focus on a more pressing question…

Where Is the Entry Point?

The entry point value used by JEB comes from the LC_UNIXTHREAD command contained in the Mach-O header, which specifies a CPU state to load at startup. How could this program be even executable if the declared entry point is not correct machine code (see Figure 2)?

Surely, there has to be another entry point, which is executed first. There is one indeed, and it has to do with the way the Objective-C runtime initializes the classes. An Objective-C class can implement a method named “+load” — the + means this is a class method, rather than an instance method –, which will be called during the executable initialization, that is before the program main() function will be executed.

If we look back at Figure 5, we see that among the random looking method names there is one class with this famous +load method, and here is the beginning of its code:

Figure 6 – +load method

Finally, some decent looking machine code! We just found the real entry point of the binary, and now the adventure can really begin…

That’s it for today, stay tuned for more technical sweetness on JEB blog!

Automatic Identification of Mirai Original Code

Context

One of the major threat on embedded devices — the so-called “Internet of things” –, is the infamous Mirai malicious software, whose source code was made public in September 2016. This malware has the ability to infect devices by brute-forcing Telnet credentials, and is primarily used to launch distributed denial-of-service attacks.

Since the source code release, numerous Mirai variants have been deployed in the wild by miscreants, like the one we documented in a recent post.

In this blog we will first take a quick look at another Mirai-based malware, quite original in its own way, to then introduce our novel signature system that can identify Mirai original code in executables.

Yet Another Mirai Variant

On May 18th, ESET’s Michal Malík mentioned on Twitter a Mirai-based sample for MIPS that grabbed our attention. Michal pointed out new functionalities like a custom update mechanism, and some strange debug routines, so we decided to take a look with our brand new MIPS decompiler. It should be noted that this sample comes with the debug symbols, which explains the names present in the decompiler output.

The malware logic starts in its main() routine, which is shown below as decompiled by JEB.

Briefly summarized, this routine first sets up a few signal handlers, in particular to create a core file in case of segmentation fault. It then calls a homemade panic() function — not to be confused with the standard Linux panic() routine. The panic() function code is shown below, as seen in JEB.

While the routine native code — seen on the left side — can be pretty dry to read, the decompiled code on the right side is fairly straightforward: a file named file.txt is opened and a given error message is written to it, accompanied by a custom system footprint built by the footprint12() routine.

Finally, main() calls the kill_run_mobile1() function, which first kills any application listening on TCP port 18899 (likely others instances of the same malware),  and then creates a thread on the mobile_loop1() function, which is shown below.

The new thread will listen for incoming connections and process them through a custom command handler. As can be seen from the numerous debug messages in the decompiled code, the code is still in a development stage.

To summarize, this sample appears to be an attempt to repackage Mirai source code with a different update mechanism, and is still in development, as can be seen from the presence of debug routines, and the fact that plenty of code remains unused.

While the technical quality of this sample is dubious, it illustrates one of the major consequence of Mirai source code public release: it has lowered the bar of entry for malicious software developers. In particular, we can expect the strain of Mirai-based malicious software to continue to grow in the following months.

Native Code Signatures

In a context where numerous Mirai-based malware are deployed in the wild, having the ability to identify original Mirai code becomes particularly useful, as it allows the analyst to focus only on the new functionalities in each sample.

Of course, most of Mirai-based samples do not come with symbols, and hence we need a proper mechanism to identify Mirai original code. That is the purpose of the native signature system released with JEB 2.3, which can actually identify code for all native architectures supported by JEB (x86, ARM, MIPS and the associated variants).

The objective of this signature system is to identify native routines with a minimal number of false positives. In others words, we want to fully trust a successful identification, while we may miss some known routines.

To realize this low false positives goal, our signatures are primarily based on two features:

  • A custom hash computed on the binary code of the unknown routine. During this computation, we remove from the native instructions the addresses and offsets that may vary depending on where the routine is located in a binary. Hence the same routine located at a different place will have the same hash. Interestingly, as our algorithm uses the generic JEB interface on native instructions (IInstruction), the hash computation is done on all architectures in the same way.
  • The names of the routines called by the unknown routine, e.g. API routines, system calls, or already identified routines. This feature allows to distinguish wrappers that have exactly the same binary code but call a different routine.  

The whole signature process can be summarized in two steps — which will be described in details in a separate documentation:

  1. Signatures are generated from a reference file. This file can be a native file with symbols, or a JEB database with some routines renamed by the user. For each named routine, a signature containing the routine features and information is created. Signatures are then grouped into packages for each platform.
  2. When JEB analyzes an unknown routine, it tries to match it with the signatures. If there is a match, the information of the original routine are imported, e.g. the matched unknown routine is renamed as the original routine.

Due to its strict reliance on the binary code, this identification process does not offer a resistance to minor changes, like the ones introduced by compilation with a different compiler version or with different optimizations. We intend to develop others signature systems in JEB, which will be more resistant to such variations, in particular by using JEB intermediate representation.

Still, it is particularly suitable in the case of Mirai, where the public source code comes with compilation instructions, such that many samples are compiled in the same way and share the exact same binary code. Therefore, JEB 2.3 comes with a set of signatures created from a non-stripped executable created from Mirai public source code.

These signatures are automatically applied when a MIPS binary is loaded in JEB. For example, here is an extract of the initial routines list after loading in JEB a stripped Mirai sample deployed last year (SHA1: 03ecd3b49aa19589599c64e4e7a51206a592b4ef).

On the 204 routines contained in the sample, 120 are automatically identified and renamed by JEB, allowing the user to focus on the unknown routines. It should be noticed that not all recognized routines belong to Mirai specific code, some of them belong to the C library used by Mirai (uClibc).

Conclusion

The JEB native signature system is still in development, but its results are encouraging and we provide a set of signatures for Mirai on MIPS platform, and for the standard C library shipped with Microsoft Visual Studio 2013 on the x86 platform. We encourage users to try it through our demo version, and report any comments to support@pnfsoftware.com.

In the following weeks, not only will the number of signatures rapidly grow — through a specific update mechanism –, but we also intend to let users generate their own signatures with JEB public API.

Acknowledgement

The malicious software analysis presented in this post was done by our intern Hugo Genesse.

Analyzing a New MIPS IoT Malware With JEB

Over the last few months, several major vulnerabilities in a certain brand of IP cameras have been publicly released. One vulnerability allows remote code execution, while another permits the retrieval of the administrator’s credentials. The situation is made worse by the fact that many of these cameras are reachable on the Internet (around 185,000 according to one of the researcher).

It did not take long for miscreants to abuse this discovery, and a novel malicious software 1 was recently propagated through the vulnerable cameras, as described in a 360.cn blog post.

This malicious software comes with MIPS and ARM versions, so we decided to quickly analyze it using our brand new MIPS decompiler. This blog post describes our findings.

Note: JEB MIPS decompiler being in beta mode, the decompiled output presented in this blog post should be considered with caution; we provide it mainly to allow the reader to get an idea of JEB capabilities. As we are constantly refining the decompiler, the produced code will strongly evolve in the next few months.

Recon

The sample we will be analyzing is the following:

7A0485E52AA09F63D41E471FD736584C06C3DAB6: ELF 32-bit MSB executable, MIPS, MIPS-I version 1 (SYSV), statically linked, stripped

After opening it in JEB, our disassembler found 526 routines. To give the reader an idea, here is what it looks like at the program entry point:

We can see here the disassembled MIPS code, which can be a hard language to read to say the least. Hopefully JEB is able to decompile it, as shown below (names are our own):

The main() routine is where the malware logic lies, and will be described below.

The interested reader might have noticed the comments in the assembly code. Those comments are the result of what we call the “advanced analysis” step, which can be roughly described as an emulation of the native code (based on JEB custom intermediate representation). This allows to find the actual values manipulated by the code, when those values are the result of previous computations. The advanced analysis will be properly described in a separate blog post.

But before going on with the analysis, one might want to take a look at the strings used by the malware, to get a sense of its abilities:

We can observe some likely C&C server information, and various strings related to the malware network abilities. Interestingly, an Arabic string clearly stands out from the others; it can be translated to “Loading Version 1”.

A final preparation step is to look at the system calls made by the malicious software, as it allows to easily understand some routines behavior. JEB automatically renames such syscalls — rather than just showing the system call number resulting from the advanced analysis phase, and displays them in a separate panel:

The user can then jump to these syscall references, and rename them appropriately, as done in the following example:

Through this process we renamed around 60 routines that are simply wrappers for syscalls.

Our reconnaissance step being done, we can now dig into the malware core logic!

Workflow

We start at the main() routine previously mentioned, and describe here the main steps of this malicious software. As we will see, part of this malware code is borrowed from the infamous Mirai malware, whose source code was made public in September 2016.

Initialization

At startup the malware does a few initialization steps, most of them being directly copy-pasted from Mirai. There is one original action though, which can be seen in the following image:

The files /tmp/ftpupdate.sh and /tmp/ftpupload.sh are first removed, then linked to /dev/null. These two files are used by various exploits against these IP cameras, and hence the malware makes sure a newly infected device can not be infected again.

C&C Commands

The malware then enters in a loop to fetch 1-byte commands from the C&C server (whose domain name is hardcoded). We counted 8 different commands, some of them having subcommands. We will now describe the most interesting ones.

Infection

As previously explained, this malware propagates by infecting vulnerable IP cameras connected to the Internet. To do so, it first scans the Internet for these devices, by re-using the TCP SYN scanner of the Mirai malware. To illustrate that, here is the scanner initialization loop, as seen in the released Mirai source code and in the decompiled code of our malware:

Scanner code, as seen in Mirai source code…

… versus the new malware code decompiled by JEB

The only major difference is that the TCP destination port is fixed to 81 in our malicious software, rather than alternate between port 23 and 2323 in Mirai. It is worth noting than even the loop counter has the same value (SCANNER_RAW_PPS is set to 160 in Mirai source code).

If the malware finds a device with an opened port 81, it then launches the actual exploit, which is built from a combination of publicly known vulnerabilities in the IP camera web server:

  1. Extract the device administrator’s credentials by sending an HTTP request for the file login.cgi and then parsing the answer for the administrator login and password (documented here).
  2. Send two specially crafted HTTP requests to first plant a connect-back payload on the device, and then execute it (documented here). The sending of this first request is shown below, as seen in JEB:

Once the connection has been established with the miscreants’ server thanks to the connect-back payload, the newly infected device is asked to download and run the malicious software, as described in the 360.cn blog post.

Attack Routers

Another action possibly ordered by the C&C server is to scan for UPnP enabled devices, in order to add a port forwarding entry to them. Such UPnP devices typically include home routers.

To do so, the malicious software starts to repeatedly send UPnP discovery messages to random IP addresses:

Once a UPnP enabled device has been found, a SOAP request is forged to add a new port forwarding entry in its configuration:

As mentioned in another 360.cn blog post, this code may be used to exploit the CVE-2014-8361 vulnerability, which allows to execute system commands with root privileges through the <NewInternalClient> SOAP tag. Also, notice the <NewPortMappingDescription> tag set to Skype to attempt hiding the request.

UDP DDoS

As documented in the 360.cn blog, the malicious software can launch a denial of service attack over UDP. The packets are built from the SSDP discovery message, which may also serve as a preparation step for a SSDP reflection attack, though it appears the code for that is not present in the binary.

Interestingly, there is another denial of service attack implemented, using a 25-byte payload shown below:

This payload is used in amplification attacks through Valve Source Engine servers.

Conclusion

We hope the readers enjoyed this quick analysis; feel free to ask questions in the comments section below.

JEB MIPS decompiler is currently in beta mode, and a demo version can be downloaded on our website.

  1. This malware was named http81 by 360, Persirai by ESET, or is simply recognized as a variant of Mirai by other vendors.