Scanning PDF Files using JEB2

Update (9/13/2017): we open-sourced the PDF plugin. A compiled JAR binary is also available.

Update: Feb. 27: Slides – Automation How-To
Update: Dec. 3: List of notifications

In this blog post, we show how JEB2 can be used as a building block of a file analysis system. We will show how to use the Core API to create a headless client. That client will scan PDF files using the JEB2 PDF Analysis Module. Basics of the IUnit and co. interfaces is also demonstrated.

Source code on GitHub.

Sample execution output produced by the PDF Scanner

As this slide deck shows, the back-end and front-end components of JEB2 are separated. The official RCP desktop client uses the JEB2 Core API; other front-ends, like the PDF scanner, can be built using that same API.

JEB2 HL Architecture Diagram

Creating an Eclipse project

Let’s get started by creating a new code project. We will show how to do this in Eclipse.

0- Check your license of JEB2. Make sure to use a license that supports third-party client creation and the loading of third-party plugin. If you haven’t done so, download and drop the PDF module in your coreplugins/ sub-directory.

1- Clone our sample code repository: git clone https://github.com/pnfsoftware/jeb2-samplecode.git

2- Create a new Java project. The Java source folder should be rooted in the src/ directory.

3- Add the JEB2 back-end as a JAR dependency. The back-end software is contained in the file bin/cl/jeb.jar located within your installation folder. You may also want to link that JAR to the API documentation, contained in the doc/apidoc.jar file, or online at https://www.pnfsoftware.com/jeb2/apidoc

Your Package Explorer view should now look like:

Package explorer view after setting up dependencies

5- Set up the execution options. The required Java properties for execution (jeb.engcfg and jeb.lickey) can be set in the Run Configurations panel (accessible via the Run menu). Example:

Example of a Run configuration

6- Open the com.pnf.pdfscan.PDFScanner source file. You are ready to execute main().

How the scanner works

Now, let’s focus on the scanner source code.

  • The JEB2 back-end is initialized when scanFiles() is called:
    • Use JebCoreService to retrieve an instance to ICoreContext
    • Create an IEnginesContext
    • Load a project within that context (IRuntimeProject)
    • Add artifact(s) and process them (ILiveArtifact)
      • We add a single file artifact per project in this example
    • Retrieve the products (IUnit)
      • We are retrieving the top-most unit only in this example
    • Analyze the unit (see assessPdf())
    • Close the project

[Note: A detailed explanation of the above concepts (core, engines, project, artifacts, units, etc.) is outside the scope of this tutorial. Refer to our Developer Portal for more information.]

Snippet of scanFiles()

The assessPdf() method evaluates PDF units. The evaluation performed by this sample scanner is trivial: we collect the notifications created by the PDF plugin during the analysis of the file, and see if they meet basic criteria.

About the Unit Notifications:

  • Any JEB2 plugin can attach notifications to its units. The PDF plugin does so. Notifications are meant to pin-point noteworthy areas of a unit or artifiact.
  • A notification has a “dangerosity level” ranging from 0 to 100. It also has a description, an optional address to point to which area of the unit the notification is associated with, etc.
  • The API offers standard notification types, ranging from “Interesting area” to “Definitely Malicious”.
Standard notification levels offered in the NotificationType enum

A PDF unit can contain several types of notifications. Example include: corrupt areas in stream; multiple encoding of stream; JavaScript; password-protected stream; invalid/illegal entries in stream; etc.

Link: Complete list of notifications issued by the PDF plugin.

Our simple scanner reports a file as suspicious if it contains at least 2 notifications that have a level >= 70 (POTENTIALLY_HARMFUL). These thresholds can be tweaked in the source code.

The assessPdf() routine

The screenshot below is a sample output produced by the PDF scanner:

Conclusion

The intent of this entry is to shed some light on the process of writing third-party clients for JEB2, as well as what and how to use notifications reported by Units. We encourage you to visit our Developer Portal to find additional documentations as well as the reference Javadoc of the API.

Leave a Reply

Your email address will not be published. Required fields are marked *

*