Analyzing Golang Executables

The Go programming language (also known as Golang) has gained popularity during the last few years among malware developers . This can certainly be explained by the relative simplicity of the language, and the cross-compilation ability of its compiler, allowing multi-platform malware development without too much effort.

In this blog post, we dive into Golang executables reverse engineering, and present a Python extension for JEB decompiler to ease Golang analysis; here is the table of content:

  1. Golang Basics for Reverse Engineers
  2. Making JEB Great for Golang
    1. Current Status
    2. Finding (and Naming) Routines
    3. Strings Recovery
    4. Types Recovery
  3. Use-Case: Analysis of StealthWorker Malware

The JEB Python script presented in this blog can be found on our GitHub page. Make sure to update JEB to version 3.7+ before running it.

Disclaimer: the analysis in this blog post refers to the current Golang version (1.13) and part of it might become outdated with future releases.

Golang Basics for Reverse Engineers

Feel free to skip this part if you’re already familiar with Golang reverse engineering.

Let’s start with some facts that reverse engineers might find interesting to know before analyzing their first Golang executable.

1. Golang is an open-source language with a pretty active development community. The language was originally created at Google around 2007, and version 1.0 was released in March 2012. Since then, two major versions are released each year.

2. Golang has a long lineage: in particular many low-level implementation choices — some would say oddities — in Golang can be traced back to Plan9, a distributed operating system on which some Golang creators were previously working.

3. Golang has been designed for concurrency, in particular by providing so-called “goroutines“, which are lightweight threads executing concurrently (but not necessarily in parallel).

Developers can start a new goroutine simply by prefixing a function call by go. A new goroutine will then start executing the function, while the caller goroutine returns and continues its execution concurrently with the callee. Let’s illustrate that with the following Golang program:

func myDummyFunc(){
	time.Sleep(1 * time.Second)
	fmt.Println("dummyFunc executed")
}

func main(){
	
	myDummyFunc() // normal call
	fmt.Println("1 - back in main")
	
	go myDummyFunc() // !! goroutine call
	fmt.Println("2 - back in main")

	time.Sleep(3 * time.Second)
}

Here, myDummyFunc() is called once normally, and then as a goroutine. Compiling and executing this program results in the following output:

dummyFunc executed
1 - back in main
2 - back in main
dummyFunc executed

Notice how the execution was back in main() before executing the second call to dummyFunc().

Implementation-wise, many goroutines can be executed on a single operating system thread. Golang runtime takes care of switching goroutines, e.g. whenever one executes a blocking system call. According to the official documentationIt is practical to create hundreds of thousands of goroutines in the same address space“.

What makes goroutines so “cheap” to create is that they start with a very limited stack space (2048 bytes — since Golang 1.4), which will be increased when needed.

One of the noticeable consequence for reverse engineers is that native routines (almost) all start with the same prologue. Its purpose is to check if the current goroutine’s stack is large enough, as can be seen in the following CFG:

Fig. 1: Simplified x86 CFG with Golang prologue for stack growth

When the stack space is nearly exhausted, more space will be allocated — actually, the stack will be copied somewhere with enough free space. This particular prologue is present only in routines with local variables.

How to distinguish a goroutine call from a “normal” call when analyzing a binary? Goroutine calls are implemented by calling runtime.newproc, which takes in input the address of the native routine to call, the size of its arguments, and then the actual routine’s arguments.

4. Golang has a concurrent garbage collector (GC): Golang’s GC can free memory while other goroutines are modifying it.

Roughly speaking, when the GC is freeing memory, goroutines report to it all their memory writes — to prevent concurrent memory modifications to be missed by the current freeing phase. Implementation-wise, when the GC is in the process of marking used memory, all memory writes pass through a “write barrier, which performs the write and informs the GC.

For reverse engineers this can result in particularly convoluted control flow graphs (CFG). For example, here is the CFG when a global variable globalString is set to newValue:

Fig. 2: Write to global variable globalString (x86 CFG):
before doing the memory write, the code checks if the write barrier is activated,
and if yes calls runtime.gcWriteBarrier()

Not all memory writes are monitored in that manner; the rules for write barriers’ insertion are described in mbarrier.go.

5. Golang comes with a custom compiler tool chain (parser, compiler, assembler, linker), all implemented in Golang. 1 2

From a developer’s perspective, it means that once Go is installed on a machine, one can compiled for any supported platform (making Golang a language of choice for IoT malware developers). Examples of supported platforms include Windows x64, Linux ARM and Linux MIPS (see “valid combinations of $GOOS and $GOARCH“).

From a reverse engineer’s perspective, the custom Go compiler toolchain means Golang binaries sometimes come with “exotic” features (which therefore can give a hard time to reverse engineering tools).

For example, symbols in Golang Windows executables are implemented using the COFF symbol table (while officiallyCOFF debugging information [for executable] is deprecated“). The Golang COFF symbol implementation is pretty liberal: symbols’ type is set to a default value — i.e. there is no clear distinction between code and data.

As another example, Windows PE read-only data section “.rdata” has been defined as executable in past Go versions.

Interestingly, Golang compiler internally uses pseudo assembly instructions (with architecture-specific registers). For example, here is a snippet of pseudo-code for ARM (operands are ordered with source first):

MOVW $go.string."hello, world\n"(SB), R0
MOVW R0, 4(R13)
MOVW $13, R0
MOVW R0, 8(R13)
CALL "".dummyFunc(SB)
MOVW.P 16(R13), R15


These pseudo-instructions could not be understood by a classic ARM assembler (e.g. there is no CALL instruction on ARM). Here are the disassembled ARM instructions from the corresponding binary:

LDR R0, #451404h // "hello, world\n" address
STR R0, [SP, #4]
MOV R0, #13
STR R0, [SP, #8]
BL main.dummyFunc
LDR PC, [SP], #16

Notice how the same pseudo-instruction MOVW got converted either as STR or MOV machine instructions. The use of pseudo-assembly comes from Plan9, and allows Golang assembler parser to easily handle all architectures: the only architecture-specific step is the selection of machine instructions (more details here).

6. Golang uses by default a stack-only calling convention.

Let’s illustrate that with the following diagram, showing the stack’s state when a routine with two integer parameters a and b, and two return values — declared in Go as “func myRoutine(a int, b int) (int, int)” — is called:

Fig. 3: Simplified stack view (stack grows downward), when a routine with two parameters and two return values is called . The return values are reserved slots for the callee.

It is the caller’s responsibilities to reserve space for the callees’ parameters and returned values, and to free it later on.

Note that Golang’s calling convention situation might soon change: since version 1.12, several calling conventions can coexist — the stack-only calling convention remaining the default one for backward compatibility reasons.

7. Golang executables are usually statically-linked, i.e. do not rely on external dependencies 3. In particular they embed a pretty large runtime environment. Consequently, Golang binaries tend to be large: for example, a “hello world” program compiled with Golang 1.13 is around 1.5MB with its symbols stripped.

8. Golang executables embed lots of symbolic information:

  • Debug symbols, implemented as DWARF symbols. These can be stripped at compilation time (command-line option -ldflags "-w") .
  • Classic symbols for each executable file format (PE/ELF/Mach-O). These can be stripped at compilation time (command-line option -ldflags "-s").
  • Go-specific metadata, including for example all functions’ entry points and names, and complete type information. These metadata cannot (easily) be stripped, because Golang runtime needs them: for example, functions’ information are needed to walk the stack for errors handling or for garbage collection, while types information serve for runtime type checks.

Of course, Go-specific metadata are very good news for reverse engineers, and parsing these will be one of the purpose of the JEB’s Python extension described in this blog post.

Making JEB Great for Golang

Current Status

What happens when opening a Golang executable in JEB? Let’s start from the usual “hello world” example:

package main

import "fmt"

func main() {
	fmt.Printf("hello, world\n")
}

If we compile it for as a Windows x64 PE file, and open it in JEB, we can notice that its code has only been partially disassembled. Unexplored memory areas can indeed be seen next to code areas in the native navigation bar (right-side of the screen by default):

Fig.4: Navigation bar for Golang PE file
(blue is code, green is data, grey represents area without any code or data)

We can confirm that the grey areas surrounding the blue areas are code, by manually disassembling them (hotkey ‘C’ by default).

Why did JEB disassembler miss this code? As can be seen in the Notifications window, the disassembler used a CONSERVATIVE strategy, meaning that it only followed safe control flow relationships (i.e. branches with known targets) 4.

Because Go runtime calls most native routines indirectly, in particular when creating goroutines, JEB disassembler finds little reliable control flow relationships, explaining why some code areas remain unexplored.

Before going on, let’s take a look at the corresponding Linux executable, which we can obtain simply by setting environment variable $GOOS to linux before compiling. Opening the resulting ELF file in JEB brings us in a more positive situation:

Fig. 5: Navigation bar for Golang ELF file
(blue is code, green is data, grey represents area without any code or data)

Due to the use by default of AGGRESSIVE strategy for disassembling ELF files, JEB disassembler found the whole code area (all code sections were linearly disassembled). In particular this time we can see our main routine, dubbed main.main by the compiler:

Fig. 6: Extract of main.main routine’s disassembly

Are data mixed with code in Golang executables? If yes, that would make AGGRESSIVE disassembly a risky strategy. At this moment (version 1.13 with default Go compiler), this does not seem to be the case:

– Data are explicitly stored in different sections than code, on PE and ELF.

Switch statements are not implemented with jumptables — a common case of data mixed with code, e.g. in Visual Studio or GCC ARM. Note that Golang provides several switch-like statements, as the select statement or the type switch statement.

As anything Golang related, the situation might change in future releases (for example, there is still an open discussion to implement jumptables for switch).

Yet, there is still something problematic in our ELF disassembly: the “hello world” string was not properly defined. Following the reference made by LEA instruction in the code, we reach a memory area where many strings have indeed been misrepresented as 1-byte data items:

Fig. 7: Dump of the memory area containing strings. Only the first byte of the strings is defined.

Now that we have a better idea of JEB’s current status, we are going to explain how we extended it with a Python script to ease Golang analysis.

Finding and Naming Routines

The first problem on our road is the incomplete control flow, specially on Windows executables. At first, it might seem that PE files disassembly could be improved simply by setting disassembler’s strategy to AGGRESSIVE, exactly as for ELF files. While it might be an acceptable quick solution, we can actually improve the control flow in a much safer way by parsing Go metadata.

Parsing “Pc Line Table”

Since version 1.2, Golang executables embed a structure called “pc line table”, also known as pclntab. Once again, this structure (and its name) is an heritage from Plan9, where its original purpose was to associate a program counter value (“pc”) to another value (e.g. a line number in the source code).

The structure has evolved, and now contains a function symbol table, which stores in particular the entry points and names of all routines defined in the binary. The Golang runtime uses it in particular for stack unwinding, call stack printing and garbage collection.

In others words, pclntab cannot be easily stripped from a binary, and provide us a reliable way to improve our disassembler’s control flow!

First, our script locates pclntab structure (refer to locatePclntab() for the details):

  # non-stripped binary: use symbol
  if findSymbolByName(golangAnalyzer.codeContainerUnit, 'runtime.pclntab') != None:
      pclntabAddress = findSymbolByName(..., 'runtime.pclntab')
  
  # stripped binary
  else:
    # PE: brute force search in .rdata. or in all binary if section not present
    if [...].getFormatType() == WellKnownUnitTypes.typeWinPe
    [...]
  
    # ELF: .gopclntab section if present, otherwise brute force search
    elif [...].getFormatType() == WellKnownUnitTypes.typeLinuxElf:
    [...]

On stripped binaries (i.e. without classic symbols), we search memory for the magic constant 0xFFFFFFFB starting pclntab, and then runs some checks on the possible fields. Note that it is usually easier to parse Golang ELF files, as important runtime structures are stored in distinct sections.

Second, we parse pclntab and use its function symbol table to disassemble all functions and rename them:

[...]
# enqueue function entry points from pclntab and register their names as labels
for myFunc in pclntab.functionSymbolTable.values():   
 nativeCodeAnalyzer.enqueuePointerForAnalysis(EntryPointDescription(myFunc.startPC), INativeCodeAnalyzer.PERMISSION_FORCEFUL)
 if rename:
   labelManager.setLabel(myFunc.startPC, myFunc.name, True, True, False)

# re-run disassembler with the enqueued entry points
self.nativeCodeAnalyzer.analyze()

Running this on our original PE file allows to discover all routines, and gives the following navigation bar:

Fig. 8: Navigation bar for Golang PE file after running the script
(blue is code, green is data, grey represents area without any code or data)

Interestingly, a few Golang’s runtime routines provide hints about the machine used to compile the binary, for example:

runtime.schedinit(): references Go’s build version. Knowing the exact version allows to investigate possible script parsing failures (as some internal structures might change depending on Go’s version).

runtime.GOROOT(): references Go’s installation folder used during compilation. This might be useful for malware tracking.

These routines are present only if the rest of the code relies on them. If it is the case, FunctionsFinder module highlights them in JEB’s console, and the user can then examine them.

The Remaining Unnamed Routines

Plot twist! A few routines found by the disassembler remain nameless even after FunctionsFinder module parsed pclntab structure. All these routines are adjacent in memory and composed of the same instructions, for example:

Fig. 9: Series of unnamed routines in x86

Long story short, these routines are made for zeroing or copying memory blobs, and are part of two large routines respectively named duff_zero and duff_copy.

These large routines are Duff’s devices made for zeroing/copying memory. They are generated as long unrolled loops of machine instructions. Depending on how many bytes need to be copied/zeroed the compiler will call directly on a particular instruction. For each of these calls, a nameless routine will then be created by the disassembler.

DuffDevicesFinder module identifies such routines with pattern matching on assembly instructions. By counting the number of instructions, it then renames them duff_zero_N/duff_copy_N, with N the number of bytes zeroed/copied.

Source Files

Interestingly, pclntab structure also stores original source filespaths. This supports various Golang’s runtime features, like printing meaningful stack traces, or providing information on callers from a callee (see runtime.Caller()). Here is an example of a stack trace obtained after a panic():

PANIC
goroutine 1 [running]:
main.main()
        C:/Users/[REDACTED]/go/src/hello_panic/hello_panic.go:4 +0x40

The script extracts the list of source files and print them in logs.

Strings Recovery

The second problem we initially encountered in JEB was the badly defined strings.

What Is a String?

Golang’s strings are stored at runtime in a particular structure called StringHeader with two fields:

type StringHeader struct {
        Data uintptr       // string value
        Len  int           // string size 
}

The string’s characters (pointed by the Data field) are stored in data sections of the executables, as a series of UTF-8 encoded characters without null-terminators.

Dynamic Allocation

StringHeader structures can be built dynamically, in particular when the string is local to a routine. For example:

Fig. 10: StringHeader instantiation in x86

By default JEB disassembler defines a 1-byte data item (gvar_4AFB52 in previous picture) for the string value, rather than a proper string, because:

  • As the string value is referenced only by LEA instruction, without any hints on the data type (LEA is just loading an “address”), the disassembler cannot type the pointed data accordingly.
  • The string value does not end with a null-terminator, making JEB’s standard strings identification algorithms unable to determine the string’s length when scanning memory.

To find these strings, StringsBuilder module searches for the particular assembly instructions usually used for instantiating StringHeader structures (for x86/x64, ARM and MIPS architectures). We can then properly define a string by fetching its size from the assembly instructions. Here is an example of recovered strings:

Of course, this heuristic will fail if different assembly instructions are employed to instantiate StringHeader structures in future Golang compiler release (such change happened in the past, e.g. x86 instructions changed with Golang 1.8).

Static Allocation

StringHeader can also be statically allocated, for example for global variables; in this case the complete structure is stored in the executable. The code referencing such strings employs many different instructions, making pattern matching not suitable.

To find these strings, we scan data sections for possible StringHeader structures (i.e. a Data field pointing to a printable string of size Len). Here is an example of recovered structures:

Fig. 13: Reconstructed StringHeader

The script employs two additional final heuristics, which scan memory for printable strings located between two already-defined strings. This allows to recover strings missed by previous heuristics.

When a small local string is used for comparison only, no StringHeader structure gets allocated. The string comparison is done directly by machine instructions; for example, CMP [EAX], 0x64636261 to compare with “abcd” on x86.

Types Recovery

Now that we extended JEB to handle the “basics” of Golang analysis, we can turn ourselves to what makes Golang-specific metadata particularly interesting: types.

Golang executables indeed embed descriptions for all types manipulated in the binary, including in particular those defined by developers.

To illustrate that, let’s compile the following Go program, which defines a Struct (Golang’s replacement for classes) with two fields:

package main

type DummyStruct struct{
	boolField bool
	intField int
}

func dummyFunc(s DummyStruct) int{
	return 13 * s.intField
}

func main(){
	s := DummyStruct{boolField: true, intField:37}
	t := dummyFunc(s)
	t += 1
}

Now, if we compile this source code as a stripped x64 executable, and analyze it with TypesBuilder module, the following structure will be reconstructed:

Fig. 14: Structure reconstructed by TypesBuilder, as seen in JEB’s type editor

Not only did we get the structure and its fields’ original names, but we also retrieved the structure’s exact memory layout, including the padding inserted by the compiler to align fields. We can confirm DummyStruct‘s layout by looking at its initialization code in main():

Fig. 15: DummyStruct initialization: intField starts at offset 8, as extracted from type information

Why So Much Information?

Before explaining how TypesBuilder parses types information, let’s first understand why these information are needed at all. Here are a few Golang features that rely on types at runtime:

  • Dynamic memory allocation, usually through a call to runtime.newobject(), which takes in input the description of the type to be allocated
  • Dynamic type checking, with statements like type assertions or type switches. Roughly speaking, two types will be considered equals if they have the same type descriptions.
  • Reflection, through the built-in package reflect, which allows to manipulate objects of unknown types from their type descriptions

Golang type descriptions can be considered akin to C++ Run-Time Type Information, except that there is no easy way to prevent their generation by the compiler. In particular, even when not using reflection, types descriptors remain present.

For reverse engineers, this is another very good news: knowing types (and their names) will help understanding the code’s purpose.

Of course, it is certainly doable to obfuscate types, for example by giving them meaningless names at compilation. We did not find any malware using such technique.

What Is A Type?

In Golang each type has an associated Kind, which can take one the following values:

const (
    Invalid Kind = iota
    Bool
    Int
    Int8
    Int16
    Int32
    Int64
    Uint
    Uint8
    Uint16
    Uint32
    Uint64
    Uintptr
    Float32
    Float64
    Complex64
    Complex128
    Array
    Chan
    Func
    Interface
    Map
    Ptr
    Slice
    String
    Struct
    UnsafePointer
)

Alongside types usually seen in programming languages (integers, strings, boolean, maps, etc), one can notice some Golang-specific types:

  • Array: fixed-size array
  • Slice: variable-size view of an Array
  • Func: functions; Golang’s functions are first-class citizens (for example, they can be passed as arguments)
  • Chan: communication channels for goroutines
  • Struct: collection of fields, Golang’s replacement for classes
  • Interface: collection of methods, implemented by Structs

The type’s kind is the type’s “category”; what identifies the type is its complete description, which is stored in the following rtype structure:

    type rtype struct {
      size       uintptr
      ptrdata    uintptr  // number of bytes in the type that can contain pointers
      hash       uint32   // hash of type; avoids computation in hash tables
      tflag      tflag    // extra type information flags
      align      uint8    // alignment of variable with this type
      fieldAlign uint8    // alignment of struct field with this type
      kind       uint8    // enumeration for C
      alg        *typeAlg // algorithm table
      gcdata     *byte    // garbage collection data
      str        nameOff  // string form
      ptrToThis  typeOff  // type for pointer to this type, may be zero
    }

The type’s name is part of its description (str field). This means that, for example, one could define an alternate integer type with type myInt int, and myInt and int would then be distinct types (with distinct type descriptors, each of Int kind). In particular, assigning a variable of type myInt to a variable of type int would necessitate an explicit cast.

The rtype structure only contains general information, and for non-primary types (Struct, Array, Map,…) it is actually embedded into another structure (as the first field), whose remaining fields provides type-specific information.

For example, here is strucType, the type descriptor for types with Struct kind:

    type structType struct {
      rtype
      pkgPath name          
      fields  []structField
    }

Here, we have in particular a slice of structField, another structure describing the structure fields’ types and layout.

Finally, types can have methods defined on them: a method is a function with a special argument, called the receiver, which describes the type on which the methods applies. For example, here is a method on MyStruct structure (notice receiver’s name after func):

func (myStruct MyStruct) method1() int{
    ...
}

Where are methods’ types stored? Into yet another structure called uncommonType, which is appended to the receiver’s type descriptor. In other words, a structure with methods will be described by the following structure:

type UncommonStructType struct {
      rtype
      structType
      uncommonType
}

Here is an example of such structure, as seen in JEB after running TypesBuilder module:

Fig. 16: Type descriptor for a structure with methods:
StrucType (with embedded rtype, and referencing StructField),
followed by UncommonType (referencing MethodType)

Parsing type descriptors can therefore be done by starting from rtype (present for all types), and adding wrapper structures around it, if needed. Properly renaming type descriptors in memory greatly helps the analysis, as these descriptors are passed as arguments to many runtime routines (as we will see in StealthWorker’s malware analysis).

The final step is to transform the type descriptors into the actual types — for example, translating a structType into the memory representation of the corresponding structure –, which can then be imported in JEB types. For now, TypesBuilder do this final import step for named structures only.

Describing in details all Golang’s type descriptors is out-of-scope for this blog. Refer to TypesBuilder module for gory details.

Locating Type Descriptors

The last question we have to examine is how to actually locate type descriptors in Golang binaries. This starts with a structure called moduledata, whose purpose is to “record information about the layout of the executable“:

    type moduledata struct {
      pclntable    []byte
      ftab         []functab
      filetab      []uint32
      findfunctab  uintptr
      minpc, maxpc uintptr

      text, etext           uintptr
      noptrdata, enoptrdata uintptr
      data, edata           uintptr
      bss, ebss             uintptr
      noptrbss, enoptrbss   uintptr
      end, gcdata, gcbss    uintptr
      types, etypes         uintptr

      textsectmap []textsect
      typelinks   []int32 // offsets from types
      itablinks   []*itab

      [...REDACTED...]
    }

This structure defines in particular a range of memory dedicated to storing type information (from types to etypes). Then, typelink field stores offsets in the range where type descriptors begin.

So first we locate moduledata, either from a specific symbol for non-stripped binaries, or through a brute-force search. For that, we search for the address of pclntab previously found (first moduledata field), and then apply some checks on its fields.

Second, we start the actual parsing of the types range, which is a recursive process as some types reference others types, during which we apply the type descriptors’ structures.

There is no backward compatibility requirement on runtime’s internal structures — as Golang executables embed their own runtime. In particular, moduledata and type descriptions are not guaranteed to stay backward compatible with older Golang release (and they were already largely modified since their inception).

In others words, TypesBuilder module’s current implementation might become outdated in future Golang releases (and might not properly work on older versions).

Use-Case: StealthWorker

We are now going to dig into a malware dubbed StealthWorker. This malware infects Linux/Windows machines, and mainly attempts to brute-force web platforms, such as WordPress, phpMyAdmin or Joomla. Interestingly, StealthWorker heavily relies on concurrency, making it a target of choice for a first analysis.

The sample we will be analyzing is a x86 Linux version of StealthWorker, version 3.02, whose symbols have been stripped (SHA1: 42ec52678aeac0ddf583ca36277c0cf8ee1fc680)

Reconnaissance

Here is JEB’s console after disassembling the sample and running the script with all modules activated (FunctionsFinder, StringsBuilder, TypesBuilder, DuffDevicesFinder, PointerAnalyzer):

>>> Golang Analyzer <<<
> pclntab parsed (0x84B79C0)
> first module data parsed (0x870EB20)
> FunctionsFinder: 9528 function entry points enqueued (and renamed)
> FunctionsFinder: running disassembler... OK
 > point of interest: routine runtime.GOROOT (0x804e8b0): references Go root path of developer's machine (sys.DefaultGoroot)
 > point of interest: routine runtime.schedinit (0x8070e40): references Go version (sys.TheVersion)
> StringsBuilder: building strings... OK (4939 built strings)
> TypesBuilder: reconstructing types... OK (5128 parsed types - 812 types imported to JEB - see logs)
> DuffDevicesFinder: finding memory zero/copy routines... OK (93 routines identified)
> PointerAnalyzer: 5588 pointers renamed
> see logs (C:\[REDACTED]\log.txt)

Let’s start with some reconnaissance work:

  • The binary was compiled with Go version 1.11.4 (referenced in runtime.schedinit‘s code, as mentioned by the script’s output)
  • Go’s root path on developer’s machine is /usr/local/go (referenced by runtime.GOROOT‘s code)
  • Now, let’s turn to the reconstructed strings; there are too many to draw useful conclusions at this point, but at least we got an interesting IP address (spoiler alert: that’s the C&C’s address):
Fig. 17: Extract of StealthWorker’s strings
as seen in JEB after running the script
  • More interestingly, the list of source files extracted from pclntab (outputted in the script’s log.txt) shows a modular architecture:
> /home/user/go/src/AutorunDropper/Autorun_linux.go
> /home/user/go/src/Check_double_run/Checker_linux.go
> /home/user/go/src/Cloud_Checker/main.go
> /home/user/go/src/StealthWorker/WorkerAdminFinder/main.go
> /home/user/go/src/StealthWorker/WorkerBackup_finder/main.go
> /home/user/go/src/StealthWorker/WorkerBitrix_brut/main.go
> /home/user/go/src/StealthWorker/WorkerBitrix_check/main.go
> /home/user/go/src/StealthWorker/WorkerCpanel_brut/main.go
> /home/user/go/src/StealthWorker/WorkerCpanel_check/main.go
> /home/user/go/src/StealthWorker/WorkerDrupal_brut/main.go
> /home/user/go/src/StealthWorker/WorkerDrupal_check/main.go
> /home/user/go/src/StealthWorker/WorkerFTP_brut/main.go
> /home/user/go/src/StealthWorker/WorkerFTP_check/main.go
> /home/user/go/src/StealthWorker/WorkerHtpasswd_brut/main.go
> /home/user/go/src/StealthWorker/WorkerHtpasswd_check/main.go
> /home/user/go/src/StealthWorker/WorkerJoomla_brut/main.go
> /home/user/go/src/StealthWorker/WorkerJoomla_check/main.go
> /home/user/go/src/StealthWorker/WorkerMagento_brut/main.go
> /home/user/go/src/StealthWorker/WorkerMagento_check/main.go
> /home/user/go/src/StealthWorker/WorkerMysql_brut/main.go
> /home/user/go/src/StealthWorker/WorkerOpencart_brut/main.go
> /home/user/go/src/StealthWorker/WorkerOpencart_check/main.go
> /home/user/go/src/StealthWorker/WorkerPMA_brut/main.go
> /home/user/go/src/StealthWorker/WorkerPMA_check/WorkerPMA_check.go
> /home/user/go/src/StealthWorker/WorkerPostgres_brut/main.go
> /home/user/go/src/StealthWorker/WorkerSSH_brut/main.go
> /home/user/go/src/StealthWorker/WorkerWHM_brut/main.go
> /home/user/go/src/StealthWorker/WorkerWHM_check/main.go
> /home/user/go/src/StealthWorker/WorkerWP_brut/main.go
> /home/user/go/src/StealthWorker/WorkerWP_check/main.go
> /home/user/go/src/StealthWorker/Worker_WpInstall_finder/main.go
> /home/user/go/src/StealthWorker/Worker_wpMagOcart/main.go
> /home/user/go/src/StealthWorker/main.go

Each main.go corresponds to a Go package, and its quite obvious from the paths that each of them targets a specific web platform. Moreover, there seems to be mainly two types of packages: WorkerTARGET_brut, and WorkerTARGET_check.

There are no information regarding the time of compilation in Golang executables. In particular executables’ timestamps have been set to a fixed value at compilation, in order to always generate the same executable from a given input.

  • Let’s dig a bit further by looking at main package, which is where execution begins; here are its routines with pretty informative names:
Fig. 18: main’s package routines

Additionally there is a series of type..hash* and type..eq* methods for main package:

Fig. 19: Hashing methods (automatically generated for complex types)

These methods are automatically generated for types equality and hashing, and therefore their presence indicates that non-trivial custom types are used in main package (as we will see below).

We can also examine main.init() routine. The init() routine is generated for each package by Golang’s compiler to initialize others packages that this package relies on, and the package’s global variables:

Fig. 20: Packages initialization from main.init()

Along the previously seen packages, one can notice some interesting custom packages:

  • github.com/remeh/sizedwaitgroup: a re-implementation of Golang’s WaitGroup — a mechanism to wait for goroutines termination –, but with a limit in the amount of goroutines started concurrently. As we will see, StealthWorker’s developer takes special care to not overload the infected machine.
  • github.com/sevlyar/go-daemon: a library to write daemon processes in Go.

Golang packages’ paths are part of a global namespace, and it is considered best practice to use GitHub’s URLs as package paths for external packages to avoid conflicts.

Concurrent Design

In this blog, we will not dig into each StealthWorker’s packages implementation, as it has been already been done several times. Rather, we will focus on the concurrent design made to organize the work between these packages.

Let’s start with an overview of StealthWorker’s architecture:

Fig. 21: StealthWorker’s design overview

At first, a goroutine executing getActiveProject() regularly retrieves a list of “projects” from the C&C server. Each project is identified by a keyword (wpChk for WordPress checker, ssh_b for SSH brute-forcer, etc).

From there, the real concurrent work begins: five goroutines executing PrepareTaskFunc() retrieve a list of targets for each project, and then distribute work to “Workers”. There are several interesting quirks here:

  • To allow PrepareTaskFunc() goroutines to communicate with Worker() goroutines, a Channel is instantiated:
Fig. 22: Channel’s instantiation

As can be seen from the channel type descriptor — parsed and renamed by the script –, the Channel is made for objects of type interface {}, the empty interface. In others words, objects of any type can be sent and received through it (because “direction:both”).

PrepareTaskFunc() will then receive from the C&C server a list of targets for a given project — as JSON objects –, and for each target will instantiate a specific structure. We already noticed these structures when looking at main package’s routines, here are their reconstructed form in the script’s logs:

> struct main.StandartBrut (4 fields):
    - string Host (offset:0)
    - string Login (offset:8)
    - string Password (offset:10)
    - string Worker (offset:18)

> struct main.StandartChecker (5 fields):
    - string Host (offset:0)
    - string Subdomains (offset:8)
    - string Subfolder (offset:10)
    - string Port (offset:18)
    - string Worker (offset:20)

> struct main.WPBrut (5 fields):
    - string Host (offset:0)
    - string Login (offset:8)
    - string Password (offset:10)
    - string Worker (offset:18)
    - int XmlRpc (offset:20)

> struct main.StandartBackup (7 fields):
    - string Host (offset:0)
    - string Subdomains (offset:8)
    - string Subfolder (offset:10)
    - string Port (offset:18)
    - string FileName (offset:20)
    - string Worker (offset:28)
    - int64 SLimit (offset:30)

> struct main.WpMagOcartType (5 fields):
    - string Host (offset:0)
    - string Login (offset:8)
    - string Password (offset:10)
    - string Worker (offset:18)
    - string Email (offset:20)

> struct main.StandartAdminFinder (6 fields):
    - string Host (offset:0)
    - string Subdomains (offset:8)
    - string Subfolder (offset:10)
    - string Port (offset:18)
    - string FileName (offset:20)
    - string Worker (offset:28)

> struct main.WPChecker (6 fields):
    - string Host (offset:0)
    - string Subdomains (offset:8)
    - string Subfolder (offset:10)
    - string Port (offset:18)
    - string Worker (offset:20)
    - int Logins (offset:28)

Note that all structures have Worker and Host fields. The structure (one per target) will then be sent through the channel.

  • On the other side of the channel, a Worker() goroutine will fetch the structure, and use reflection to generically process it (i.e. without knowing a priori which structure was sent):
Fig. 23: StealthWorker’s use of reflection to retrieve a field from an unknown structure

Finally, depending on the value in Worker field, the corresponding worker’s code will be executed. There are two types of workers: brute-forcing workers, which try to login into the target through a known web platform, and checking workers, which test the existence of a certain web platform on the target.

From a design point-of-view, there is a difference between the two types of workers: checking workers internally relies on another Channel, in which the results are going to be written, and fetched by another goroutine named saveGood(), which reports to the C&C. On the other hand, brute-forcing workers do their task and directly report to the C&C server.

  • Interestingly, the maximum number of Worker() goroutines can be configured by giving a parameter to the executable (preceded by the argument dev). According to the update mechanism, it seems that the usual value for this maximum is 400. Then, the previously mentioned SizedWaitGroup package serves to ensure the number of goroutines stay below this value:
Fig. 24: Worker’s creation loop
SizeWaitGroup.Add() is blocking when the maximum number of goroutines has been reached. Each main.Worker() will release its slot when terminating.

We can imagine that the maximum amount of workers is tuned by StealthWorker’s operators to lower the risk of overloading infected machines (and drawing attention).

There are two additional goroutines, respectively executing routines KnockKnock() and CheckUpdate(). Both of them simply run specific tasks concurrently (and infinitely): the former sends a “ping” message to the C&C server, while the latter asks for an updated binary to execute.

What’s Next? Decompilation!

The provided Python script should allow users to properly analyze Linux and Windows Golang executables with JEB. It should also be a good example of what can be done with JEB API to handle “exotic” native platforms.

Regarding Golang reverse engineering, for now we remained at disassembler level, but decompiling Golang native code to clean pseudo-C is clearly a reachable goal for JEB. There are a few important steps to implement first, like properly handling Golang stack-only calling convention (with multiple return values), or generating type libraries for Golang runtime.

So… stay tuned for more Golang reverse engineering!

As usual, if you have questions, comments or suggestions, feel free to:

References

A few interesting reading for reverse engineers wanting to dig into Golang’s internals:

  1. The Golang compiler was originally inherited from Plan9 and was written in C, in order to solve the bootstrapping problem (how to compile a new language?), and also to “easily” implement segmented stacks — the original way of dealing with goroutines stack. The process of translating the original C compiler to Golang for release 1.5 has been described in details here and here.
  2. There are alternate compilers, e.g. gccgo and a gollvm
  3. Golang also allows to compile ‘modules’, which can be loaded dynamically. Nevertheless, for malware writers statically-linked executables remain the usual choice.
  4. Readers interested in the internals of JEB disassembler engine should refer to our recent REcon presentation

Native Signatures Generation

JEB 3.3 ships with our internal tool SiglibGen to generate signatures for native routines. Until now, users could sign individual routines only from JEB user interface (menu Native> Create Signature for Procedure), or with the auto-signing mode.

With the release of SiglibGen, users can now create signatures for whole files in batch mode, notably executables (PE, ELF) libraries (Microsoft COFF and AR files) and JDB2 (JEB project files)1.

In this post, we will explain how SiglibGen allows power-users to generate custom signature libraries, in order to quickly identify similar code between different executables.

Signature Libraries (siglibs)

Signature libraries are stored in <JEB install folder>/siglibs folder. Each signature contains a set of features identifying a routine (detailed below), and a set of attributes representing the knowledge about the routine (name, internal labels, comments…).

JEB currently ships with signature libraries for x86/x64 Microsoft Visual Studio libraries (from Visual Studio 2008 to 2017), and for ARM/ARM64 Android NDKs (from NDKr10 to NDKr19). These signatures will be automatically loaded when a suitable file is opened (see File>Engines>Signature Libraries for the complete list of available signature libraries).

These compiler signatures are intended to be “false positive free”, i.e. they should only identify the exact same routine (though it can be mapped at a different location). Therefore, the signatures can be blindly trusted by users, and by JEB automatic analysis2.

But users might want to generate their own signature libraries, for example in the following scenarios:

  • User analyzed an unknown executable. The resulting JDB2 file can then be signed, such that all routines can be identified in others executables and related information (name, comments, labels) be imported.
  • User found out that an executable is statically linked with a public library. The library can then be compiled with symbols and signed such that the library routines will be renamed in the analyzed executable3.

Use Case: Operation ShadowHammer

To illustrate the signatures generation process, we are going to use the recent attack dubbed “Operation ShadowHammer” as an example. This operation was originally documented by Kaspersky. Roughly summarized, malicious code was inserted into a legitimate ASUS’s automatic update tool named “ASUS Live Update Utility” 4 .

In this use case, we are going to put ourselves in the shoes of an analyst willing to understand the trojanized ASUS installers. We do not intend to analyze them in-depth – it has been done several times already -, but rather show how SiglibGen can accelerate the analysis.

At first, we got our hands on three samples, originally mentioned in CounterCept’s analysis with their date of use:

SHA-256Date Of Use
6aedfef62e7a8ab7b8ab3ff57708a55afa1a2a6765f86d581bc99c738a68fc74June
736bda643291c6d2785ebd0c7be1c31568e7fa2cfcabff3bd76e67039b71d0a8September
9a72f971944fcb7a143017bc5c6c2db913bbb59f923110198ebd5a78809ea5fcOctober

Oldest Sample

Quick Analysis

An analyst would likely start looking at the oldest sample (6aedfef6…), in order to investigate possible evolution of the attack. In this sample, the installer’s main() routine was modified to load a malicious PE executable from its resources:

JEB Project View. The embedded executable can be seen in resources5.

Here is the memory map after opening the malicious executable in JEB:

Embedded PE navigation view. Blue is code, cyan is code identified by siglib, green is data.

The large chunks of cyan correspond to routines identified as being part of “Microsoft Visual C++ 2010 /MT” libraries. Then, we analyzed the remaining seven routines (the blue chunk in the navigation view), and renamed them as follow:

Malicious Routines (our names)

These routines implement the following logic: check if one of the machine’s MAC address match a hard coded list, and if it’s the case download a payload (otherwise a .idx log file is dropped).

Now in order to re-use this knowledge on more recent trojanized ASUS installers, let’s generate signatures for this first sample.

Generating Signatures

In order to sign the analyzed file, we are going to create a configuration file from the sample file provided in <JEB install folder>/siglibs/custom:

;------------------------------------------------------------------------------
; *** SAMPLE *** JEB Signature Library configuration file
;------------------------------------------------------------------------------

;template file used to configure the generation of a *.siglib file for JEB

;how to generate the siglib specified by this file?
;open a terminal and execute: (eg, on Windows)
;  $ ..\..\jeb_wincon.bat -c --siglibgen=sample-siglib.cfg

;(mandatory) name of the folder containing files to sign
; must be in the same folder as this configuration file 
input_folder_name=

;(mandatory) processor type 
; see com.pnfsoftware.jeb.core.units.codeobject.ProcessorType 
; eg: X86, X86_64, ARM, ARM64, MIPS, MIPS64
processor=

;(mandatory) output siglib file name
; '.siglib' extension will be appended to it
; IMPORTANT! once generated, this file must be moved to the <JEB>/siglibs/ folder
; (user generated siglibs have to be manually loaded)
output_file_name=mysiglib

;(mandatory) unique identifier for your siglib
; keep it < 0 and decrement for each package you generate
uuid=-1

;(mandatory) *absolute* path to JEB typelibs folder, usually <JEB>/typelibs
typelibs_folder=

;(mandatory) name of your package
; e.g. 'Microsoft Visual C++ 2008 signatures' (without '')
package_name=

;(mandatory) package version
package_version=0

;(optional) description of your package
package_description=

;(optional) package author
package_author=

;(mandatory) list of features included in each signature
; i.e. the characteristics of the signed routines serving to identify them
; see com.pnfsoftware.jeb.core.units.code.asm.sig.NativeFeatureSignerID
; note: defaults should be suitable for most cases. ROUTINE_SIZE must always be included. 
features=ROUTINE_SIZE,ROUTINE_CODE_HASH,CALLED_ROUTINE_NAME_ONLY_EXTERN 

;(mandatory) list of attributes included in each signature
; i.e. additional knowledge on the signed routines conveyed by signatures 
; (other than routine name)
; see com.pnfsoftware.jeb.core.units.code.asm.sig.NativeAttributeSignerID
attributes=COMMENT,LABEL

A particularly interesting part of this configuration is the features field, where users can select the characteristics of the routine they want to put in signatures. The complete feature list can be found here; here are the features we included in our case (the default ones):

Feature NameDescription
ROUTINE_SIZESize of the routine (number of instructions).
ROUTINE_CODE_HASHCustom hash computed from the routine assembly code.
CALLED_ROUTINE_NAME_ONLY_EXTERNNames of the external routines called by the signed routine.

Note that by including ROUTINE_CODE_HASH, our signatures will only match routines with the exact same code (but possibly mapped at a different location). The use of
CALLED_ROUTINE_NAME_ONLY_EXTERN allows to distinguish different wrapper routines calling different API routines, but having the same code.

Here is the specific configuration file shadowhammer-oldest.cfg we made for this first sample:

input_folder_name=input
processor=X86
output_file_name=shadowhammer-6aedfef6
uuid=-1
typelibs_folder=[...REDACTED...]\typelibs
package_name=ShadowHammer -- sample 6aedfef6 (oldest)
package_version=0
package_description=Signatures generated from the analysis of the oldest sample known
package_author=Joan Calvet
features=ROUTINE_SIZE,ROUTINE_CODE_HASH,CALLED_ROUTINE_NAME_ONLY_EXTERN 
attributes=COMMENT,LABEL

Then we put the JDB2 file of the analyzed sample into the input folder (see configuration’s input_folder_name field). SiglibGen can then be called by executing JEB startup script (e.g. jeb_wincon.bat) with the following flags:

$jeb -c --siglibgen=shadowhammer-oldest.cfg

The generated signature libraries will then be written in the output folder. In our case, SiglibGen signed our seven routines, as indicated in siggen_stat.log file 6:

> Package created on 2019.05.01.15.29.23
> metadata: X86/ShadowHammer -- sample 6aedfef6 (oldest)/0/Signatures generated from the analysis of the oldest sample known/Joan Calvet/1556738959
> # sigs created: 7
> # very small routines: 0
> # small routines: 0
> # medium routines: 6
> # large routines: 1
> # unnamed routines: 1
> # blacklisted routines: 0
> # duplicated routines: 0

We can now copy shadowhammer-6aedfef6.siglib to <JEB>/siglibs/ folder. It will now be available under File>Engines>Signature Libraries to be manually loaded.

Second Sample Analysis

Now, it is time to turn to the second sample (736bda6432…). The workflow is quite different from the previous one: a routine call has been inserted into Visual Studio library method __crtExitProcess, which is called whenever the program exists:

Trojanized __crtExitProcess. Call to __crtCorExitProcess was replaced by a call to malicious code.

The astute reader might wonder why the routine is still named __crtExitProcess(), as if it was the original one, if one of its call has been rewritten to point elsewhere. In this case, the routine’s name comes from the fact that several caller routines were identified as library code (and are known to call __crtExitProcess()), as indicated by the routine header comment “Routine’s name comes from a caller […]”.

Following the dubious call, we end up decrypting the malicious payload, which is then executed. We can load the malicious dump in JEB with the x86 processor and the correct base address. After manually defining the code area, we obtain the following navigation view:

Memory dump’s initial navigation view. Blue is code.

For now, no compiler signature libraries were loaded because it is a memory dump without a proper PE header. As we know the previous malicious sample was compiled with Visual Studio 2010 /MT libraries, we can manually load the corresponding signatures (File>Engines>Signature Libraries). Here is the navigation bar at this time:

Memory dump’s navigation view with Visual Studio 2010 /MT signature libraries loaded. Blue is code, cyan is library code.

Most of the code has been identified. Now, we can load the custom signatures we generated from the previous sample, and we end up with two more routines being identified (i.e. miscreants directly re-used them from the first sample):

We can now look at the non-identified routines, without having to reanalyze the duplicates.

Finally, after having analyzed the remaining routines, we can generate a new signature library, following the same steps previously described. This time we put two samples in the input folder (the trojanized installer’s JDB2, and the memory dump’s JDB2). Eight routines are then signed.

Third Sample Analysis

The most recent sample (9a72f971944f…) follows the same logic as the previous one, namely it dynamically decrypts the malicious code, which is then executed. As previously, we load the memory dump in JEB with Visual Studio 2010 /MT signatures:

Memory dump’s navigation view with Visual Studio 2010 /MT signature libraries loaded.

Finally, we load the ShadowHammer signature libraries generated from the previous two samples:

Memory dump’s navigation view with
Visual Studio 2010 /MT and ShadowHammer signature libraries loaded.

At this point, only one malicious routine has not been identified (the large blue area in the navigation view). We can now focus on it, knowing that the rest of the code is the same.

If we open the two binaries side-by-side, we can rapidly pinpoint that the unidentified routine has indeed been modified between the two samples. For example:

It appears the hardcoded list of searched MAC addresses (represented by their MD5 hashes) has been modified between the two samples.

Conclusion

We hope this blog post demonstrated how SiglibGen allows users to speed up their analysis by easily re-using their work. Remember that signatures can be generated in a lighter manner directly from JEB UI (as shown in the auto-signing mode video). As usual, do not hesitate to contact us if you have any questions (emailTwitterSlack).

Note: SiglibGen might set .parsers.*.AnalysisStyle and .parsers.*.AllowAdvancedAnalysis engines option to specific values suitable for signatures generation, without restoring the original values after the generation. For now, JEB power-users have to manually restore these two engines options to the intended values after having generated signatures (menu Edit>Options>Engines). This will be fixed in next release JEB 3.4.

Annex: SiglibGen Log Files

A typical SiglibGen run will produce several log files (in the same folder):

File NamePurpose
siggen_stat.logSummary log (number of signatures created, etc). A new entry is appended to the log file at each signature generation.
siggen_report.htmlComplete HTML log file; each signed routine is shown with the corresponding features and attributes.
conflicts.txtConflict resolution file; users can tweak here the decisions taken when several routines have the same features (and then regenerate the signatures).
removals.txtRemovals resolution file; users can tweak here the automatic decisions regarding removing certain signatures (and then regenerate the signatures) .

  1. More formats could be handled, do not hesitate to contact us if you have such needs.
  2. While the signatures shown in this blog post will also be generated in a false positive free manner, SiglibGen allows to build more flexible signatures; this will the topic of another blog post.
  3. If signatures were built to be strict (i.e. not allowing any modifications to the original routine), this can be far from trivial, as the library needs to be compiled with the exact same options as the analyzed executable.
  4. There are numerous excellent analysis available for Operation ShadowHammer, like the one from CounterCept.
  5. Note that thanks to JEB recursive processing, the embedded executable does not need to be extracted, and can be directly analyzed within the original JEB’s project
  6. See Annex for a description of all log files produced by SiglibGen.

JEB3 Auto-Signing Mode

In this video we introduce a novel JEB 3.0 feature: auto-signing mode for native code.

In a nutshell, when this mode is activated all modifications made by users to native code in JEB (renaming a routine, adding a comment, etc) are “signed”.

The newly created signatures can then be loaded against another executable, and all the information of the original analysis will be imported if the same code is recognized. Therefore, the user only needs to analyze each routine once.

Without further ado, here is the video, which begins by introducing native signatures before showcasing auto-signing:

As usual, feel free to reach out to us (email, Twitter, Slack) if you have questions or suggestions.

Having Fun with Obfuscated Mach-O Files

Last week was the release of JEB 2.3.7 with a brand new parser for Mach-O, the executable file format of Apple’s macOS and iOS operating systems. This file format, like its cousins PE and ELF, contains a lot of technical peculiarities and implementing a reliable parser is not a trivial task.

During the journey leading to this first Mach-O release, we encountered some interesting executables. This short blog post is about one of them, which uses some Mach-O features to make reverse-engineering harder.

Recon

The executable in question belongs to a well-known adware family dubbed InstallCore, which is usually bundled with others applications to display ads to the users.

The sample we will be using in this post is the following:

57e4ce2f2f1262f442effc118993058f541cf3fd: Mach-O 64-bit x86_64 executable

Let’s first take a look at the Mach-O sections:

Figure 1 – Mach-O sections

Interestingly, there are some sections related to the Objective-C language (“__objc_…”). Roughly summarized, Objective-C was the main programming language for OS X and iOS applications prior the introduction of Swift. It adds some object-oriented features around C, and it can be difficult to analyze at first, in particular because of its way to call methods by “sending messages”.

Nevertheless, the good news is that Objective-C binaries usually come with a lot of meta-data describing methods and classes, which are used by Objective-C runtime to implement the message passing. These metadata are stored in the “__objc_…” sections previously mentioned, and the JEB Mach-O parser process them to find and properly name Objective-C methods.

After the initial analysis, JEB leaves us at the entry point of the program (the yellow line below):

Figure 2 – Entry point

Wait a minute… there is no routine here and it is not even correct x86-64 machine code!

Most of the detected routines do not look good either; first, there are a few objective-C methods with random looking names like this one:

Figure 3 – Objective-C method

Again the code makes very little sense…

Then comes around 50 native routines, whose code can also clearly not be executed “as is”, for example:

Figure 4 – Native routine

Moreover, there are no cross-references on any of these routines! Why would JEB disassembler engine – which follows a recursive algorithm combined with heuristics – even think there are routines here?!

Time for a Deep Dive

Code Versus Data

First, let’s deal with the numerous unreferenced routines containing no correct machine code. After some digging, we found that they are declared in the LC_FUNCTION_STARTS Mach-O command – “command” being Mach-O word for an entry in the file header.

This command provides a table containing function entry-points in the executable. It allows for example debuggers to know function boundaries without symbols. At first, this may seem like a blessing for program analysis tools, because distinguishing code from data in a stripped executable is usually a hard problem, to say the least. And hence JEB, like other analysis tools, uses this command to enrich its analysis.

But this gift from Mach-O comes with a drawback: nothing prevents miscreants to declare function entry points where there are none, and analysis tools will end up analyzing random data as code.

In this binary, all routines declared in LC_FUNCTION_STARTS command are actually not executable. Knowing that, we can simply remove the command from the Mach-O header (i.e. nullified the entry), and ask JEB to re-analyze the file, to ease the reading of the disassembly. We end up with a much shorter routine list:

Figure 5 – Routine list

The remaining routines are mostly Objective-C methods declared in the metadata. Once again, nothing prevents developers to forge these metadata to declare method entry points in data. For now, let’s keep those methods here and focus on a more pressing question…

Where Is the Entry Point?

The entry point value used by JEB comes from the LC_UNIXTHREAD command contained in the Mach-O header, which specifies a CPU state to load at startup. How could this program be even executable if the declared entry point is not correct machine code (see Figure 2)?

Surely, there has to be another entry point, which is executed first. There is one indeed, and it has to do with the way the Objective-C runtime initializes the classes. An Objective-C class can implement a method named “+load” — the + means this is a class method, rather than an instance method –, which will be called during the executable initialization, that is before the program main() function will be executed.

If we look back at Figure 5, we see that among the random looking method names there is one class with this famous +load method, and here is the beginning of its code:

Figure 6 – +load method

Finally, some decent looking machine code! We just found the real entry point of the binary, and now the adventure can really begin…

That’s it for today, stay tuned for more technical sweetness on JEB blog!

Introducing the JEB Malware Sharing Network

Update: Oct. 12: Python script to query the API

We are very excited to announce that JEB 2.3.6 integrates with a new project we called the Malware Sharing Network. It allows reverse engineers to share samples anonymously, in a give-and-take fashion. The more and the better you give, the more and the better you will receive.

  • Files are shared with PNF Software (they are not shared directly with other users);
  • Contributions and users are algorithmically ranked and scored;
  • In exchange for their contributions, users receive more files, based on their score.

The goal is to offer a platform for reversers that can (and wish to) share malware files to easily do it, with the added incentive of receiving samples in return — including relatively high-value files that may not be accessible to most users, such as files that are not publicly downloadable on most malware trackers; or files that are not present on malware databases at all, including VirusTotal.

Obviously, the service is entirely optional. Any user, including users of the demo version, may use it whenever they please.

Getting started

The latest JEB update will let you know about the Malware Sharing Network right after you upgrade. You may also click the Share button in the toolbar at any time to get started.

Sharing a sample is easy!

First time users should create an account. You will only need an email address and a password. Click the “Create an Account” button to sign up.

Log in to the Malware Sharing Network

Once you’ve successfully logged in, you will be able to view your profile. Things like your sharing score and other stats are displayed.

User profile and stats

Sharing a File

Any time you are working in JEB, you can decide to share the primary file being worked on by clicking the Share button or the Share entry in the File menu:

Before sharing a file, you may:

  • redact the sample name;
  • add a text comment;
  • select a Determination, among four choices (“Unknown”, “Clean”, “Unsure” and “Malicious”).

By hitting the Share button, you will submit the file to PNF Software. It will be added to our file portal, get scored, and eventually, be shared with other users who are participating in this sample exchange program.

When your score gets high enough, you will receive samples. They will be accessible from our website, and also, using the Malware Sharing Network back-end API.

API for Scripting

After successfully logging in, you may have noticed that the API key field was populated. Power-users will be able to use it to perform automation and scripting with our back-end, such as querying samples by hashes, uploading and downloading files, etc. It’s all standard HTTP-POST queries with JSON responses.

A Python wrapper to  issue simple API queries can be found on our public GitHub repository.  First make sure to set up your API key (either in source, or create an environment variable JEBIO_APIKEY, or pass it as a parameter if you are importing the script as a library).

Queries return JSON output,  except for download requests, that return binary attachments. The return “code” variable is set to 0 on success, !=0 on error.

Here are a few examples:

Query a file hash:
$ jebio.py check 42aaa93a894a69bfcbc21823b09e4ea9f723c428
42aaa93a894a69bfcbc21823b09e4ea9f723c428: {
 "code": 0,
 "created": "2017-10-09 16:24:31",
 "filesize": 75599,
 "filestatus": 0,
 "md5hash": "879322cfd1c1b3b1813a27c3e311f1a5",
 "sha1hash": "42aaa93a894a69bfcbc21823b09e4ea9f723c428",
 "sha256hash": "57ae463e6bc53a38512c58a878370338dcfe0fb59eeedfd9b3e7959fe7c149d1",
 "userdetails": {
 "comments": "",
 "created": "2017-10-09 16:24:31",
 "determination": 0,
 "filename": "Raasta.apk"
 }
}

Note: the userdetails section is present only if you uploladed the file yourself.

Upload a file:
$ jebio.py upload 1.apk
1.apk: {
 "code": 0,
 "uploadeventid": 155
}
Download a file: (subject to permission)
$ jebio.py download a2ba1bacc996b90b37a2c93089692bf5f30f1d68
a2ba1bacc996b90b37a2c93089692bf5f30f1d68: downloaded to ba1d6f317214d318b2a4e9a9663bc7ec867a6c845affecad1290fd717cc74f29.zip (password: "infected")

Automatic Identification of Mirai Original Code

Context

One of the major threat on embedded devices — the so-called “Internet of things” –, is the infamous Mirai malicious software, whose source code was made public in September 2016. This malware has the ability to infect devices by brute-forcing Telnet credentials, and is primarily used to launch distributed denial-of-service attacks.

Since the source code release, numerous Mirai variants have been deployed in the wild by miscreants, like the one we documented in a recent post.

In this blog we will first take a quick look at another Mirai-based malware, quite original in its own way, to then introduce our novel signature system that can identify Mirai original code in executables.

Yet Another Mirai Variant

On May 18th, ESET’s Michal Malík mentioned on Twitter a Mirai-based sample for MIPS that grabbed our attention. Michal pointed out new functionalities like a custom update mechanism, and some strange debug routines, so we decided to take a look with our brand new MIPS decompiler. It should be noted that this sample comes with the debug symbols, which explains the names present in the decompiler output.

The malware logic starts in its main() routine, which is shown below as decompiled by JEB.

Briefly summarized, this routine first sets up a few signal handlers, in particular to create a core file in case of segmentation fault. It then calls a homemade panic() function — not to be confused with the standard Linux panic() routine. The panic() function code is shown below, as seen in JEB.

While the routine native code — seen on the left side — can be pretty dry to read, the decompiled code on the right side is fairly straightforward: a file named file.txt is opened and a given error message is written to it, accompanied by a custom system footprint built by the footprint12() routine.

Finally, main() calls the kill_run_mobile1() function, which first kills any application listening on TCP port 18899 (likely others instances of the same malware),  and then creates a thread on the mobile_loop1() function, which is shown below.

The new thread will listen for incoming connections and process them through a custom command handler. As can be seen from the numerous debug messages in the decompiled code, the code is still in a development stage.

To summarize, this sample appears to be an attempt to repackage Mirai source code with a different update mechanism, and is still in development, as can be seen from the presence of debug routines, and the fact that plenty of code remains unused.

While the technical quality of this sample is dubious, it illustrates one of the major consequence of Mirai source code public release: it has lowered the bar of entry for malicious software developers. In particular, we can expect the strain of Mirai-based malicious software to continue to grow in the following months.

Native Code Signatures

In a context where numerous Mirai-based malware are deployed in the wild, having the ability to identify original Mirai code becomes particularly useful, as it allows the analyst to focus only on the new functionalities in each sample.

Of course, most of Mirai-based samples do not come with symbols, and hence we need a proper mechanism to identify Mirai original code. That is the purpose of the native signature system released with JEB 2.3, which can actually identify code for all native architectures supported by JEB (x86, ARM, MIPS and the associated variants).

The objective of this signature system is to identify native routines with a minimal number of false positives. In others words, we want to fully trust a successful identification, while we may miss some known routines.

To realize this low false positives goal, our signatures are primarily based on two features:

  • A custom hash computed on the binary code of the unknown routine. During this computation, we remove from the native instructions the addresses and offsets that may vary depending on where the routine is located in a binary. Hence the same routine located at a different place will have the same hash. Interestingly, as our algorithm uses the generic JEB interface on native instructions (IInstruction), the hash computation is done on all architectures in the same way.
  • The names of the routines called by the unknown routine, e.g. API routines, system calls, or already identified routines. This feature allows to distinguish wrappers that have exactly the same binary code but call a different routine.  

The whole signature process can be summarized in two steps — which will be described in details in a separate documentation:

  1. Signatures are generated from a reference file. This file can be a native file with symbols, or a JEB database with some routines renamed by the user. For each named routine, a signature containing the routine features and information is created. Signatures are then grouped into packages for each platform.
  2. When JEB analyzes an unknown routine, it tries to match it with the signatures. If there is a match, the information of the original routine are imported, e.g. the matched unknown routine is renamed as the original routine.

Due to its strict reliance on the binary code, this identification process does not offer a resistance to minor changes, like the ones introduced by compilation with a different compiler version or with different optimizations. We intend to develop others signature systems in JEB, which will be more resistant to such variations, in particular by using JEB intermediate representation.

Still, it is particularly suitable in the case of Mirai, where the public source code comes with compilation instructions, such that many samples are compiled in the same way and share the exact same binary code. Therefore, JEB 2.3 comes with a set of signatures created from a non-stripped executable created from Mirai public source code.

These signatures are automatically applied when a MIPS binary is loaded in JEB. For example, here is an extract of the initial routines list after loading in JEB a stripped Mirai sample deployed last year (SHA1: 03ecd3b49aa19589599c64e4e7a51206a592b4ef).

On the 204 routines contained in the sample, 120 are automatically identified and renamed by JEB, allowing the user to focus on the unknown routines. It should be noticed that not all recognized routines belong to Mirai specific code, some of them belong to the C library used by Mirai (uClibc).

Conclusion

The JEB native signature system is still in development, but its results are encouraging and we provide a set of signatures for Mirai on MIPS platform, and for the standard C library shipped with Microsoft Visual Studio 2013 on the x86 platform. We encourage users to try it through our demo version, and report any comments to support@pnfsoftware.com.

In the following weeks, not only will the number of signatures rapidly grow — through a specific update mechanism –, but we also intend to let users generate their own signatures with JEB public API.

Acknowledgement

The malicious software analysis presented in this post was done by our intern Hugo Genesse.

Analyzing a New MIPS IoT Malware With JEB

Over the last few months, several major vulnerabilities in a certain brand of IP cameras have been publicly released. One vulnerability allows remote code execution, while another permits the retrieval of the administrator’s credentials. The situation is made worse by the fact that many of these cameras are reachable on the Internet (around 185,000 according to one of the researcher).

It did not take long for miscreants to abuse this discovery, and a novel malicious software 1 was recently propagated through the vulnerable cameras, as described in a 360.cn blog post.

This malicious software comes with MIPS and ARM versions, so we decided to quickly analyze it using our brand new MIPS decompiler. This blog post describes our findings.

Note: JEB MIPS decompiler being in beta mode, the decompiled output presented in this blog post should be considered with caution; we provide it mainly to allow the reader to get an idea of JEB capabilities. As we are constantly refining the decompiler, the produced code will strongly evolve in the next few months.

Recon

The sample we will be analyzing is the following:

7A0485E52AA09F63D41E471FD736584C06C3DAB6: ELF 32-bit MSB executable, MIPS, MIPS-I version 1 (SYSV), statically linked, stripped

After opening it in JEB, our disassembler found 526 routines. To give the reader an idea, here is what it looks like at the program entry point:

We can see here the disassembled MIPS code, which can be a hard language to read to say the least. Hopefully JEB is able to decompile it, as shown below (names are our own):

The main() routine is where the malware logic lies, and will be described below.

The interested reader might have noticed the comments in the assembly code. Those comments are the result of what we call the “advanced analysis” step, which can be roughly described as an emulation of the native code (based on JEB custom intermediate representation). This allows to find the actual values manipulated by the code, when those values are the result of previous computations. The advanced analysis will be properly described in a separate blog post.

But before going on with the analysis, one might want to take a look at the strings used by the malware, to get a sense of its abilities:

We can observe some likely C&C server information, and various strings related to the malware network abilities. Interestingly, an Arabic string clearly stands out from the others; it can be translated to “Loading Version 1”.

A final preparation step is to look at the system calls made by the malicious software, as it allows to easily understand some routines behavior. JEB automatically renames such syscalls — rather than just showing the system call number resulting from the advanced analysis phase, and displays them in a separate panel:

The user can then jump to these syscall references, and rename them appropriately, as done in the following example:

Through this process we renamed around 60 routines that are simply wrappers for syscalls.

Our reconnaissance step being done, we can now dig into the malware core logic!

Workflow

We start at the main() routine previously mentioned, and describe here the main steps of this malicious software. As we will see, part of this malware code is borrowed from the infamous Mirai malware, whose source code was made public in September 2016.

Initialization

At startup the malware does a few initialization steps, most of them being directly copy-pasted from Mirai. There is one original action though, which can be seen in the following image:

The files /tmp/ftpupdate.sh and /tmp/ftpupload.sh are first removed, then linked to /dev/null. These two files are used by various exploits against these IP cameras, and hence the malware makes sure a newly infected device can not be infected again.

C&C Commands

The malware then enters in a loop to fetch 1-byte commands from the C&C server (whose domain name is hardcoded). We counted 8 different commands, some of them having subcommands. We will now describe the most interesting ones.

Infection

As previously explained, this malware propagates by infecting vulnerable IP cameras connected to the Internet. To do so, it first scans the Internet for these devices, by re-using the TCP SYN scanner of the Mirai malware. To illustrate that, here is the scanner initialization loop, as seen in the released Mirai source code and in the decompiled code of our malware:

Scanner code, as seen in Mirai source code…

… versus the new malware code decompiled by JEB

The only major difference is that the TCP destination port is fixed to 81 in our malicious software, rather than alternate between port 23 and 2323 in Mirai. It is worth noting than even the loop counter has the same value (SCANNER_RAW_PPS is set to 160 in Mirai source code).

If the malware finds a device with an opened port 81, it then launches the actual exploit, which is built from a combination of publicly known vulnerabilities in the IP camera web server:

  1. Extract the device administrator’s credentials by sending an HTTP request for the file login.cgi and then parsing the answer for the administrator login and password (documented here).
  2. Send two specially crafted HTTP requests to first plant a connect-back payload on the device, and then execute it (documented here). The sending of this first request is shown below, as seen in JEB:

Once the connection has been established with the miscreants’ server thanks to the connect-back payload, the newly infected device is asked to download and run the malicious software, as described in the 360.cn blog post.

Attack Routers

Another action possibly ordered by the C&C server is to scan for UPnP enabled devices, in order to add a port forwarding entry to them. Such UPnP devices typically include home routers.

To do so, the malicious software starts to repeatedly send UPnP discovery messages to random IP addresses:

Once a UPnP enabled device has been found, a SOAP request is forged to add a new port forwarding entry in its configuration:

As mentioned in another 360.cn blog post, this code may be used to exploit the CVE-2014-8361 vulnerability, which allows to execute system commands with root privileges through the <NewInternalClient> SOAP tag. Also, notice the <NewPortMappingDescription> tag set to Skype to attempt hiding the request.

UDP DDoS

As documented in the 360.cn blog, the malicious software can launch a denial of service attack over UDP. The packets are built from the SSDP discovery message, which may also serve as a preparation step for a SSDP reflection attack, though it appears the code for that is not present in the binary.

Interestingly, there is another denial of service attack implemented, using a 25-byte payload shown below:

This payload is used in amplification attacks through Valve Source Engine servers.

Conclusion

We hope the readers enjoyed this quick analysis; feel free to ask questions in the comments section below.

JEB MIPS decompiler is currently in beta mode, and a demo version can be downloaded on our website.

  1. This malware was named http81 by 360, Persirai by ESET, or is simply recognized as a variant of Mirai by other vendors.

Analysis of Android.Golem downloader component

Recently, we came across a new malware which seems to be a module of a recent Android trojan named dubbed Golem.

Golem has been found in several countries and hundreds of thousands of phones have already been infected, according to reports.

We performed detailed analysis of the malware using JEB, the operations achieved by the malware can be divided into several steps:

Step 1

When user start the phone or unlock the screen or light the screen, the malware will automatically download a file named “conf_plugin.txt” which contains configuration information like “update”, “md5”, “url”, etc.

Step 2

Then the malware will check if there is a jar file named “ic.jar” in phone memory, if not or if its md5 is different from the md5 in “conf_plugin.txt” (which means the local dex is different from the dex in remote server), malware will download the dex.

Step 3

After that, the malware will install and run the dex and execute the “onCreate” function in the “com.facebook.mini.service.RunService” class.

The complete process can be represented by the graph below:

Based on the analysis, the malware can automatically download, launch and run application without user’s authorization. The downloaded apps will run with the set of permissions already requested by the downloader:

Through this malware, the attacker can easily get your personal information, contacts or even bank accounts and passwords. Also, the attacker can remotely control your phone to open specified application and perform some bad operations to make illicit profits.

Sample SHA256:
3cb7a4792725d381653fcca18d584f9fbe47d67f455db03e3c53e8e8e7736385

Analysis by Ruoxiao Wang

Deobfuscating Android Triada malware

The Triada malware has received a lot of news coverage recently. Kaspersky was one of the first firm to publish an analysis of this Trojan earlier last week.

The code is obfuscated, and most strings are encrypted. The string encryption algorithm is trivial, but ever-changing across classes: bytes are incremented or decremented by constant values, either stored in a default decryptor method, or retrieved via calls to other methods. The result is something quite annoying to handle if you decide to perform a serious static analysis of the file.

Encrypted string buffers in Triada. Decryption routines can be seen in the decompiled class on the right-hand side.

Our intern Ruoxiao Wang wrote a very handy decryption script for Triada. It needs customizing (the decryption keys are not automatically retrieved) on a per-class basis, but the overall effort is a couple of seconds versus hours spending doing tedious and repetitive semi-manual work.

The script will decrypt the encrypted byte arrays and replace the decompiled Java fields supposedly holding the final strings by their actual value, as seen in the picture below.

Decrypted strings. Comments (in the left-side red box) indicate the string use was not found via xrefs. The right-side red box shows updated String fields after decryption.

The script can also be used as a tutorial on how to use the JEB Java AST API to look for and modify the AST of decompiled code.  (More examples be seen on our GitHub sample script repo.)

Download the Triada decryptor script here:
TriadaStringDecryptor.py

(Specific instructions are located in the script header.)

Version 0.2.9 of the PDF analyzer plugin is available

Update (9/13/2017): we open-sourced the PDF plugin. A compiled JAR binary is also available.

We have released version 0.2.9 of our PDF analyzer plugin for JEB2. This release adds support for XFA (XML Forms Architecture) fragment streams reconstruction and parsing.

In the following example, a malicious PDF file contains two XFA streams encoded with the unusual CCITTFFax encoder. Once decoded, JEB2 reassembles the decoded contents into a unit “32 0”. The XFA contains a malicious JavaScript snippet, also visible as a separate unit:

Reconstructed XFA data showing a malicious JavaScript snippet.

Notifications reported also show a dangerous Open action.

The malicious PDF file examined in this entry is available on VirusTotal.
SHA256: e108432dd9dad6ff57c8de6e907fd6dd25b62673bd4799fa1a47b200db5acf7c