Scanning PDF Files using JEB2

Update (9/13/2017): we open-sourced the PDF plugin. A compiled JAR binary is also available.

Update: Feb. 27: Slides – Automation How-To
Update: Dec. 3: List of notifications

In this blog post, we show how JEB2 can be used as a building block of a file analysis system. We will show how to use the Core API to create a headless client. That client will scan PDF files using the JEB2 PDF Analysis Module. Basics of the IUnit and co. interfaces is also demonstrated.

Source code on GitHub.

Sample execution output produced by the PDF Scanner

As this slide deck shows, the back-end and front-end components of JEB2 are separated. The official RCP desktop client uses the JEB2 Core API; other front-ends, like the PDF scanner, can be built using that same API.

JEB2 HL Architecture Diagram

Creating an Eclipse project

Let’s get started by creating a new code project. We will show how to do this in Eclipse.

0- Check your license of JEB2. Make sure to use a license that supports third-party client creation and the loading of third-party plugin. If you haven’t done so, download and drop the PDF module in your coreplugins/ sub-directory.

1- Clone our sample code repository: git clone https://github.com/pnfsoftware/jeb2-samplecode.git

2- Create a new Java project. The Java source folder should be rooted in the src/ directory.

3- Add the JEB2 back-end as a JAR dependency. The back-end software is contained in the file bin/cl/jeb.jar located within your installation folder. You may also want to link that JAR to the API documentation, contained in the doc/apidoc.jar file, or online at https://www.pnfsoftware.com/jeb/apidoc

Your Package Explorer view should now look like:

Package explorer view after setting up dependencies

5- Set up the execution options. The required Java properties for execution (jeb.engcfg and jeb.lickey) can be set in the Run Configurations panel (accessible via the Run menu). Example:

Example of a Run configuration

6- Open the com.pnf.pdfscan.PDFScanner source file. You are ready to execute main().

How the scanner works

Now, let’s focus on the scanner source code.

  • The JEB2 back-end is initialized when scanFiles() is called:
    • Use JebCoreService to retrieve an instance to ICoreContext
    • Create an IEnginesContext
    • Load a project within that context (IRuntimeProject)
    • Add artifact(s) and process them (ILiveArtifact)
      • We add a single file artifact per project in this example
    • Retrieve the products (IUnit)
      • We are retrieving the top-most unit only in this example
    • Analyze the unit (see assessPdf())
    • Close the project

[Note: A detailed explanation of the above concepts (core, engines, project, artifacts, units, etc.) is outside the scope of this tutorial. Refer to our Developer Portal for more information.]

Snippet of scanFiles()

The assessPdf() method evaluates PDF units. The evaluation performed by this sample scanner is trivial: we collect the notifications created by the PDF plugin during the analysis of the file, and see if they meet basic criteria.

About the Unit Notifications:

  • Any JEB2 plugin can attach notifications to its units. The PDF plugin does so. Notifications are meant to pin-point noteworthy areas of a unit or artifiact.
  • A notification has a “dangerosity level” ranging from 0 to 100. It also has a description, an optional address to point to which area of the unit the notification is associated with, etc.
  • The API offers standard notification types, ranging from “Interesting area” to “Definitely Malicious”.
Standard notification levels offered in the NotificationType enum

A PDF unit can contain several types of notifications. Example include: corrupt areas in stream; multiple encoding of stream; JavaScript; password-protected stream; invalid/illegal entries in stream; etc.

Link: Complete list of notifications issued by the PDF plugin.

Our simple scanner reports a file as suspicious if it contains at least 2 notifications that have a level >= 70 (POTENTIALLY_HARMFUL). These thresholds can be tweaked in the source code.

The assessPdf() routine

The screenshot below is a sample output produced by the PDF scanner:

Conclusion

The intent of this entry is to shed some light on the process of writing third-party clients for JEB2, as well as what and how to use notifications reported by Units. We encourage you to visit our Developer Portal to find additional documentations as well as the reference Javadoc of the API.

Writing client scripts for JEB2 using Python

The latest release of JEB2, version 2.0.14, introduces a feature familiar to JEB1 users: client scripts written in Python.

Both Standard and Business licenses permit running scripts. They can be written using the Python 2.5 or 2.7 syntax and features, and are executed by Jython. (A Jython stand-alone package is required to run scripts. We recommend version 2.5. Download it and drop it in your JEB2 scripts/ sub-directory.)

Feature-wise, scripts use the standard JEB2 core APIs. They are also using the client API, available in the com.pnfsoftware.jeb.client.api package. As usual, refer to our Developer Portal and Javadoc website for API reference and usage.

A client script implements the IScript interface. Upon execution, the script run() entry-point method is provided an IClientContext or derived object, such as an IGraphicalClientContext for UI clients. (The official RCP desktop client falls in the latter category.)

Here is the simplest of all scripts:

from com.pnfsoftware.jeb.client.api import IScript
class JEB2SampleScript(IScript):
  def run(self, ctx):
    print('Hello, JEB2')

Within the official desktop client, scripts can be executed via the File, Scripts menu item.

Finally, remember that scripts are meant to execute small, light-weight actions. Heavy lifting operations (such as parsing or background event-driven tasks) should be implemented by back-end plugins in Java.

Check out our GitHub repository for more sample scripts.

Developing JEB2 parsers and plugins

Update (11/2): parts 7 and 8 are available.

Our tutorials are available on the JEB2 developer portal, which aggregates all resources for API developers:

JEB2 Developer Portal

  1. Getting Started with Parsers
  2. Creating a Simple Parser
  3. Documents and Delegation
  4. Tables and Trees
  5. Development Tips
  6. Releasing a Plugin
  7. Interactivity
  8. Interactivity, Part 2
  9. Persistence (To be published)

Setting up JEB2 to parse optimized DEX (odex) files

This blog assumes that JEB version 2.1.0 or above is used along with the OAT plugin 1.0.2 or above.

Parsing support for optimized DEX files was added to JEB2 to allow the analysis of non-deodex’ed files. Since ODEX files are target-dependant, the executing Dalvik VM is no longer restricted to regular opcodes. ODEX files may make use of “illegal” opcodes, optimized opcodes, or even the once regular but now dead extended opcodes. Whenever possible, parsing will take place, and instructions displayed in the assembly view.

In the screenshot below, note that opcode  43h (illegal for non-optimized code) is used, as well as iput-wide-volatile (optimized opcode for field access).

In that second screenshot, notice the use of a non-standard jumbo opcode.

If you are analyzing an extracted ODEX file (one whose header bytes start with “dey\n”), then all versions of JEB2 shall be able to process it. The Project tree will look like the following (project > artifact > odex unit > dex unit):

If you are analyzing an OAT file (DEX file precompiled to native and ready to run within the ART runtime), then you will need one additional plugin: the OAT plugin. This plugin can be registered on Business and Enterprise versions of JEB2. (Note: older versions of JEB 2.0, mainly versions 2.0.12 and above, require the third-party ELF plugin as well.)

Installation steps:

  1. Visit our public GitHub account
  2. Download the latest package of the OAT plugin
  3. Drop the JAR file in the coreplugins folder within your JEB2 installation directory
  4. Restart JEB2. The lines “Plugin loaded … OATPlugin” should be visible in the console
The OAT plugin is loaded (the ELF third-party plugin is no longer required with JEB 2.1+)

Now, you may open an OAT file. The project view should be similar to the following (project > artifact > elf unit > oat unit > dex or odex unit):

Here is another example of an ELF file containing an OAT section, containing 2 optimized DEX files:

An ELF/OAT file containing 2 DEX files

That is it for this blog post. We are planning to release more documentation and tutorials about our APIs In the coming days. In the meantime, remember to check our open-source plugins on GitHub, they are great starting points for anyone interested in writing their own parsers or back-end plugins. Stay tuned, and happy analysis.

JEB2 introduction videos

With the constant improvement of our UI client, new plugins, the upcoming release of the Dalvik debugger, as well as the APIs, JEB2 has become increasingly complex and full-featured.

In order to reduce the barrier of entry for new users, we are going to publish a series of introductory videos aiming to explain how to get started with JEB2 and how to use the RCP client to the fullest. We will strive to keep the videos short and concise, highlighting specific features and gotchas of our software.

You will find the YouTube “JEB2 Discovery” playlist here:

JEB2 available to all customers

Dear users,

We are glad to announce that JEB2 is now available to all our customers!

If you have a valid JEB 1.x license, you should have received an equivalent build with a subscription valid till the end of your support period. If this is not the case, please reach out to support@pnfsoftware.com.

Additional documentation and resources can be found on our website. We are also providing PDF analysis support to all business/enterprise customers! The JEB2 PDF plugin can be downloaded here: https://www.pnfsoftware.com/jeb/plugins.

Open-source plugins and PoC plugins such as Linux ELF, Android OAT, MIPS, Microsoft OLE, XLS, PPT, etc. can and will be found on our GitHub repository here: https://github.com/pnfsoftware.

The back-end APIs will be made available in the Fall 2015.

Note: This message was originally published on our Google Groups forum.

Android Dalvik, inside OAT, inside ELF

As a follow up to our last blog on Adobe PDF and Microsoft XLS plugins for JEB2, here is another example of deep analysis support in the case of nested artifacts, as is the case for Android pre-compiled native apps.

Those apps are run using ART, the newest Android Runtime. They are native Linux ELF .so files, embedding a custom Android OAT file, which in turns contains one or more Dalvik DEX bytecode files. See the pictures below for an example:

jeb2-elf-oat-1 jeb2-elf-oat-2

 

The ELF and OAT plugins will be open sourced.

Stay tuned for more news within the next few days!

 

JEB2 plugins for document formats

As explained in our June 18 blog, JEB2 architecture and back-end API allows the development of third-party code: Plugins such as disassemblers, decompilers, parsers, or else, can be easily integrated to provide analysis capability for virtually any type of data.

We have been working on proof-of-concepts plugins for various file formats internally, such as:

An Adobe PDF file format plugin. The plugin provides deep PDF view and navigation, anomaly detection, binary correspondence, and more. Coupled with other JEB2 analysis plugins (such as a JavaScript beautifier plugin), that makes for a powerful PDF reverse-engineering tool:

jeb2-pdf-2 jeb2-pdf-1

jeb2-pdf-0

A Microsoft Compound File / OLE file, for various document types, such as Excel in the pictures below:

jeb2-ole-4 jeb2-ole-1

 

jeb2-ole-3 jeb2-ole-2

We also have plugins for ELF object files, MIPS machine code, Android OAT resource extraction, ETC1 image reconstruction, etc.

Plugins can work on several types of input, including the output of other plugins, for recursive and deep analysis in artifact data. We are planning to open up the back-end API shortly after the full versions release, which will happen in mid July.

Some of those plugins will be open-source, we hope they provide  great tutorials and insights into plugins writing for the JEB2 back-end API.

What is changing with JEB2

As we announced yesterday, the demo version of JEB2 beta is now available for download! We are very excited about this new product, and here is why:

New features and improvements

  • A complete separation of the front- and back-end

This enables the creation of various front-ends: high-end customers can craft their own clients. They may be graphical, command-line based, or integrated within an automation pipeline.

Connected to that design point is the new, rich UI desktop platform, which had been on our road-map for quite some time. Our customers will now enjoy the power of Eclipse RCP as their primary official front-end.

  • A modular, plugin-based back-end architecture

This allows JEB2 to go beyond Android-only files. Although our primary focus stays on mobile, JEB2 is now able to support any type of binary parser, text beautifier, code disassembler, decompiler, or more generally, input transformer.

JEB2 ships with various modules designed to enable Android static analysis. Other modules will be shipped in the medium-term. Our customers will receive those modules via the traditional update channels. Some will be open-source. and available on our GitHub repository.

Application Programming Interfaces (APIs) will allow developers to write their own back-end plugins, back-end transformers, and, in the case of the official RCP front-end, client scripts. Our Full versions will ship with plugins to demonstrate what can be done with the back-end API: we will provide proof-of-concept plugins to support files such as Android OAT, Android JOBB expansion pack, Linux ELF, or MIPS binary code, to name a few.

From an immediate features perspective, JEB2 offers advanced capabilities such as virtual hierarchies and package renaming, optimized memory and computing usage when dealing with big files, multiple views and complete code hierarchies, side-by-side disassembly and decompiled code, the ability to analyze multiple artifacts within a single session.

JEB2 also supports artifact re-parsing. Recursive processing and artifact analysis delegation, manual or automatic, was a crucial design goal. We will demonstrate those capabilities in future blogs, the user manual, as well as YouTube videos.

A technical note regarding JEB 1.x to JEB2 migration:

  • JEB 1.x database files (“JDB”) files are not compatible with their JEB2 counterparts. We may provide a tool to convert or extract the information out of JDB files. However, it is unlikely that it ships with the initial release of JEB2;
  • JEB 1.x API is not compatible with the back-end API of JEB2.

A new subscription model

JEB2 is moving away from the traditional “perpetual license” model. JEB2 remains a desktop software, but is now subscription-based. This allows us to:

  • make sure all customers are using the most recent software release, a condition required to provide efficient and timely support;
  • offer flexible plans, ranging from a monthly standard package for one-off consulting tasks, to complete packages with API access, floating seats, and more.

Here are some additional details. We are currently planning to offer three plans: Standard, Business and Enterprise. The Standard package – just like the public demo build – does require an Internet connection to operate. Professional packages (Business and Enterprise) do not: they are fully-functional in air-gaps, a common industry practice when analyzing malicious code. The professional packages also offer APIs and support levels. Please refer to the pricing page for details.

Customers with a valid JEB 1.x license will receive an equivalent JEB2 subscription till the end of their current support period.

  • JEB 1.x Full → JEB2 Business
  • JEB 1.x Floating → JEB2 Enterprise

All JEB 1.x quotes that were issued before June 17, 2015, will be honored. JEB 1.x will receive fixes for major issues. We may also consult on special requests for JEB 1.x.

Finally, users with a valid JEB 1.x license will be able to use it according to the terms of the original user agreement.

More to come

The official release of JEB2 is being finalized as we speak. Most of the final tweaks will based on further internal testing and your feedback during this demo period. Give a try to the demo version of our latest beta build and let us know your comments via email, forum, or Twitter. We will continue to post on this blog to address questions and provide additional details over the coming weeks. Thank you.