{"id":414,"date":"2015-12-01T11:08:53","date_gmt":"2015-12-01T19:08:53","guid":{"rendered":"https:\/\/www.pnfsoftware.com\/blog\/?p=414"},"modified":"2017-09-13T00:51:25","modified_gmt":"2017-09-13T08:51:25","slug":"scanning-pdf-files-using-jeb2","status":"publish","type":"post","link":"https:\/\/www.pnfsoftware.com\/blog\/scanning-pdf-files-using-jeb2\/","title":{"rendered":"Scanning PDF Files using JEB2"},"content":{"rendered":"<p><strong><em>Update (9\/13\/2017): we open-sourced the <a href=\"https:\/\/github.com\/pnfsoftware\/jeb2-plugin-pdf\">PDF plugin<\/a>. A compiled JAR binary is also <a href=\"https:\/\/github.com\/pnfsoftware\/jeb2-plugin-pdf\/tree\/master\/out\">available<\/a>.<\/em><\/strong><\/p>\n<p><em>Update: Feb. 27: <a href=\"https:\/\/docs.google.com\/presentation\/d\/1PzqNg026HBflOozB7g_Vsxy-XfDQm0JQynsph5tfXuQ\/pub\" target=\"_blank\" rel=\"noopener\">Slides &#8211; Automation How-To<\/a><br \/>\nUpdate: Dec. 3: <a href=\"https:\/\/github.com\/pnfsoftware\/jeb2-plugin-pdf\/wiki\/Notifications\">List of notifications<\/a><\/em><\/p>\n<p>In this blog post, we show how JEB2 can be used as a building block of a file analysis system. We will show how to use the <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\">Core API<\/a> to <strong>create a headless client<\/strong>. That client will <strong>scan PDF files<\/strong> using the <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/pdfplugin\">JEB2 PDF Analysis Module<\/a>. Basics of the <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/IUnit.html\">IUnit<\/a> and co. interfaces is also demonstrated.<\/p>\n<p style=\"text-align: center;\"><strong><a href=\"https:\/\/github.com\/pnfsoftware\/jeb2-samplecode\/blob\/master\/src\/com\/pnf\/pdfscan\/PDFScanner.java\">Source code on GitHub.<\/a><\/strong><\/p>\n<figure style=\"width: 735px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/366a7023e3c1e4f0e50529b94717e3e0.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/366a7023e3c1e4f0e50529b94717e3e0.png\" alt=\"\" width=\"735\" height=\"317\" \/><\/a><figcaption class=\"wp-caption-text\">Sample execution output produced by the PDF Scanner<\/figcaption><\/figure>\n<p>As <a href=\"https:\/\/docs.google.com\/presentation\/d\/1kU_ko8e8WlUH4dkXZguJ0Ke2ZfwUZV4JV01D7cChgIc\/pub?slide=id.gbd40314d6_1_0\">this slide deck<\/a> shows, the back-end and front-end components of JEB2 are separated. The official RCP desktop client uses the JEB2 Core API; other front-ends, like the PDF scanner, can be built using that same API.<\/p>\n<figure style=\"width: 895px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/d189c9b10389bde2b52a4aaf17f89763.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/d189c9b10389bde2b52a4aaf17f89763.png\" alt=\"\" width=\"895\" height=\"502\" \/><\/a><figcaption class=\"wp-caption-text\">JEB2 HL Architecture Diagram<\/figcaption><\/figure>\n<h1>Creating an Eclipse project<\/h1>\n<p>Let&#8217;s get started by creating a new code project. We will show how to do this in Eclipse.<\/p>\n<p><strong>0- Check your license of JEB2.<\/strong> Make sure to use a license that supports third-party client creation and the loading of third-party plugin. If you haven&#8217;t done so, <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/pdfplugin\">download and drop the PDF module<\/a> in your <strong>coreplugins\/<\/strong>\u00a0sub-directory.<\/p>\n<p><strong>1- Clone our sample code repository:\u00a0<\/strong><span style=\"font-family: monospace, serif; font-size: 15px; white-space: pre-wrap;\">git clone\u00a0https:\/\/github.com\/pnfsoftware\/jeb2-samplecode.git<\/span><\/p>\n<p><strong>2- Create a new Java project.<\/strong>\u00a0The Java source folder should be rooted in the src\/ directory.<\/p>\n<p><strong>3- Add the JEB2 back-end as a JAR dependency.<\/strong> The back-end software is contained\u00a0in the file <em>bin\/cl\/jeb.jar<\/em> located within your installation folder. You may also want to link that JAR to the API documentation, contained in the <em>doc\/apidoc.jar<\/em> file, or online at <em>https:\/\/www.pnfsoftware.com\/jeb\/apidoc<\/em><\/p>\n<p>Your Package Explorer view should now look like:<\/p>\n<figure style=\"width: 991px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/73264bb5b5d9ad82f83a02f60d1070c5.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/73264bb5b5d9ad82f83a02f60d1070c5.png\" alt=\"\" width=\"991\" height=\"540\" \/><\/a><figcaption class=\"wp-caption-text\">Package explorer view after setting up dependencies<\/figcaption><\/figure>\n<p><strong>5- Set up the execution options.<\/strong> The required Java properties for execution (<em>jeb.engcfg<\/em> and <em>jeb.lickey<\/em>) can be set in the Run Configurations panel (accessible via the Run menu). Example:<\/p>\n<figure style=\"width: 770px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/28628982e834cb9439baad3160290a8f.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/28628982e834cb9439baad3160290a8f.png\" alt=\"\" width=\"770\" height=\"438\" \/><\/a><figcaption class=\"wp-caption-text\">Example of a Run configuration<\/figcaption><\/figure>\n<p><strong>6-\u00a0Open the com.pnf.pdfscan.PDFScanner\u00a0source file.<\/strong> You are ready to execute main().<\/p>\n<h1>How the scanner works<\/h1>\n<p>Now, let&#8217;s focus on the scanner <a href=\"https:\/\/github.com\/pnfsoftware\/jeb2-samplecode\/blob\/master\/src\/com\/pnf\/pdfscan\/PDFScanner.java\">source code<\/a>.<\/p>\n<ul>\n<li>The JEB2 back-end is initialized when scanFiles() is called:\n<ul>\n<li>Use <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/JebCoreService.html\">JebCoreService<\/a> to retrieve an instance to <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/ICoreContext.html\">ICoreContext<\/a><\/li>\n<li>Create an <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/IEnginesContext.html\">IEnginesContext<\/a><\/li>\n<li>Load a project within that context (<a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/IRuntimeProject.html\">IRuntimeProject<\/a>)<\/li>\n<li>Add artifact(s) and process them (<a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/ILiveArtifact.html\">ILiveArtifact<\/a>)\n<ul>\n<li>We add a single file artifact per project in this example<\/li>\n<\/ul>\n<\/li>\n<li>Retrieve the products (<a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/IUnit.html\">IUnit<\/a>)\n<ul>\n<li>We are retrieving the top-most unit only in this example<\/li>\n<\/ul>\n<\/li>\n<li>Analyze the unit (see assessPdf())<\/li>\n<li>Close the project<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><em>[Note: A detailed explanation of the above concepts (core, engines, project, artifacts, units, etc.) is outside the scope of this tutorial. Refer to our Developer Portal for more information.]<\/em><\/p>\n<figure style=\"width: 776px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/e9a9dca33f4684601a98057255f0ce10.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/e9a9dca33f4684601a98057255f0ce10.png\" alt=\"\" width=\"776\" height=\"488\" \/><\/a><figcaption class=\"wp-caption-text\">Snippet of scanFiles()<\/figcaption><\/figure>\n<p>The assessPdf() method evaluates PDF units. The evaluation performed by this sample scanner is trivial: we collect the <a href=\"https:\/\/www.pnfsoftware.com\/jeb\/apidoc\/reference\/com\/pnfsoftware\/jeb\/core\/units\/IUnitNotification.html\">notifications<\/a> created by the PDF plugin during the analysis of the file, and see if they meet basic criteria.<\/p>\n<p>About the <em>Unit Notifications<\/em>:<\/p>\n<ul>\n<li>Any JEB2 plugin can attach notifications to its units. The PDF plugin does so. <strong>Notifications are meant to pin-point <em>noteworthy<\/em> areas of a unit or artifiact<\/strong>.<\/li>\n<li>A notification has a &#8220;dangerosity level&#8221; ranging from 0 to 100. It also has a description, an optional address to point to which area of the unit the notification is associated with,\u00a0etc.<\/li>\n<li>The API offers standard notification types, ranging from &#8220;Interesting area&#8221; to &#8220;Definitely Malicious&#8221;.<\/li>\n<\/ul>\n<figure style=\"width: 507px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/f0efef57d709be00b2b06454c3a6dd69.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/f0efef57d709be00b2b06454c3a6dd69.png\" alt=\"\" width=\"507\" height=\"408\" \/><\/a><figcaption class=\"wp-caption-text\">Standard notification levels offered in the NotificationType enum<\/figcaption><\/figure>\n<p>A PDF unit can contain several types of notifications. Example include: corrupt areas in stream; multiple encoding of stream; JavaScript; password-protected stream; invalid\/illegal entries in stream; etc.<\/p>\n<p><strong><a href=\"https:\/\/github.com\/pnfsoftware\/jeb2-plugin-pdf\/wiki\/Notifications\">Link: Complete\u00a0list of notifications issued by the PDF plugin.<\/a><\/strong><\/p>\n<p>Our simple scanner reports a file as suspicious if it contains at least 2 notifications that have a level &gt;= 70 (POTENTIALLY_HARMFUL). These thresholds can be tweaked in the source code.<\/p>\n<figure style=\"width: 724px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/893e791bf39ab1baa652de5c3b2255d3.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/893e791bf39ab1baa652de5c3b2255d3.png\" alt=\"\" width=\"724\" height=\"243\" \/><\/a><figcaption class=\"wp-caption-text\">The assessPdf() routine<\/figcaption><\/figure>\n<p>The screenshot below is a sample output produced by the PDF scanner:<\/p>\n<p><a href=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/366a7023e3c1e4f0e50529b94717e3e0.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/www.pnfsoftware.com\/blog\/wp-content\/uploads\/2015\/12\/366a7023e3c1e4f0e50529b94717e3e0.png\" alt=\"\" width=\"735\" height=\"317\" \/><\/a><\/p>\n<h1>Conclusion<\/h1>\n<p>The intent of\u00a0this entry is to shed some light on the process of writing third-party clients for JEB2, as well as what and how to use notifications reported by Units. We encourage you to visit our\u00a0<a href=\"https:\/\/www.pnfsoftware.com\/jeb\/devportal\">Developer Portal<\/a>\u00a0to find additional documentations as well as the reference Javadoc of the API.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Update (9\/13\/2017): we open-sourced the PDF plugin. A compiled JAR binary is also available. Update: Feb. 27: Slides &#8211; Automation How-To Update: Dec. 3: List of notifications In this blog post, we show how JEB2 can be used as a building block of a file analysis system. We will show how to use the Core &hellip; <a href=\"https:\/\/www.pnfsoftware.com\/blog\/scanning-pdf-files-using-jeb2\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Scanning PDF Files using JEB2<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9,8,2,11],"tags":[],"class_list":["post-414","post","type-post","status-publish","format-standard","hentry","category-api-jeb2","category-jeb2","category-malware","category-pdf"],"_links":{"self":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/posts\/414","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/comments?post=414"}],"version-history":[{"count":0,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/posts\/414\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/media?parent=414"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/categories?post=414"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pnfsoftware.com\/blog\/wp-json\/wp\/v2\/tags?post=414"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}