Preview PDF files as images on your website

by Damiaan Peeters 9. January 2017 20:41

We got a question, and said yes.  If we can do that?  Yes, we can! When you get back at the office you get a cold shiver along your spine when you discover there is no NuGet package available for you to install.

What was the question

The client regularly publishes articles in magazines and journals.  They repost the article on their own website.  Content marketing they told me.  On their old website they had to upload a PDF and a thumbnail of the pdf. 

But dear web builder, you can do better right? Umbraco is a CMS build for content editors and is easy to extent.  Please put the thumbnail automatically inside our website.

The search

There seem to be several products on the market for several thousands of dollars. But the client would not be willing to pay for licenses. The typical conversion could have been like this:

You are building on a large ecosystem of free and open software, right? Yes, dear client. And PDF is an open format right? Yes, dear client. I could not be the first ever to ask this, right? No, dear client.  Can't you ask your friendly colleagues of this friendly CMS?  Sure, dear client. (*)

I tried a few .net managed libraries which I found on the internet.  But most of the managed free PDF libraries you find online are

  1. not working,
  2. incomplete,
  3. in beta
  4. provide a very small subset of what you can do with PDF.
  5. Or simple do not support rendering a PDF to another digital format (like PNG or JPEG).
  6. Or are a wrapper around (some obscure) unmanaged DLL’s

The only way out would be to start a new managed library to read PDF files.  Which can’t be too hard, because it’s mostly PostScript, right?  But going down that path, took already several hours.  Precious time which we did not had.  It was Christmas holiday after all.

From the commercial pallet of products, I did tried Imazen’s Resizer.Net and it worked, but having 2 image processors was complete overkill, and was not playing along with ImageProcessor.  However that could have been me messing around with various dll’s.

Because I lost all hope at finding a complete managed (.net) solution, and because I already found several NuGet packages wrapping GhostScript, we decided to settle.  GhostScript is a pretty mature product after all.

The solution: using ImageProcessor and GhostScript(.net)

We choose GhostScript.Net to be our wrapper around GhostScript.  Mostly because the logo was very nice, certainly compared to the other wrappers available. 

Further we did a small POC proving it worked.  The code is pretty simple:

var rasterizer = new GhostscriptRasterizer();
rasterizer.Open(pdfUri, lastInstalledVersionOfGhostScript, false);
System.Drawing.Image img = _rasterizer.GetPage(_desiredXDpi, _desiredYDpi, pageNumber);
rasterizer.Close();

This very simple code did what we needed.  It converted the (first) page of a PDF to an image format.  It was very simple to create a .Net handler for this to create an image.  But we wanted to have a better dev experience.  And also be able to resize easier.

ImageProcessor to the rescue

The best advice from last year (2016) is: build on the shoulder of giants.  If you have not met James South, it’s a giant.  He is way over 2 meter big, and strong as an elephant. He knows to drink, like a real Scott, and moved to Australia a catch his own food with his bear hands.  Or not.  But he IS a craftsmen and builder of the marvelous ImageProcessor, used and distributed along with Umbraco.

So we wondered if Imageprocessor could support a different file format, like… PDF, we would have a solution.

It turns out that is not too hard.  Just create a new class inherited from FormatBase, override the appropriate members.  If you override the LOAD method of the base class, you could load a PDF, rasterize it, and pass the System.Drawing.Image object to ImageProcessor.  It will handle everything for you?

For example: /media/1234/myPublication.pdf?width=500

Let us serve other clients to!

We do have other clients uploading PDF’s from time to time.  I am pretty sure that we will be installing this plugin to avoid the next support request: “image not showing on website”.

The code is available on github: https://github.com/dampee/ImageProcessor.Plugins.Pdf/

A NuGet package is available too: https://www.nuget.org/packages/ImageProcessor.Plugins.PDF/

 

Let me know what you think of the package.

 

(*) this is fiction, they are actually very nice and polite.  It is only me, myself and I taking blame for the path we took.

 

 

 

 

Who.I.am

Certified Umbraco Master, Part of Umbraco Certified partner comm-it, .Net and Azure developer, seo lover. Magician in my spare time.

Month List