Preview PDF files as images on your website

by Damiaan Peeters 9. January 2017 20:41

We got a question, and said yes.  If we can do that?  Yes, we can! When you get back at the office you get a cold shiver along your spine when you discover there is no NuGet package available for you to install.

What was the question

The client regularly publishes articles in magazines and journals.  They repost the article on their own website.  Content marketing they told me.  On their old website they had to upload a PDF and a thumbnail of the pdf. 

But dear web builder, you can do better right? Umbraco is a CMS build for content editors and is easy to extent.  Please put the thumbnail automatically inside our website.

The search

There seem to be several products on the market for several thousands of dollars. But the client would not be willing to pay for licenses. The typical conversion could have been like this:

You are building on a large ecosystem of free and open software, right? Yes, dear client. And PDF is an open format right? Yes, dear client. I could not be the first ever to ask this, right? No, dear client.  Can't you ask your friendly colleagues of this friendly CMS?  Sure, dear client. (*)

I tried a few .net managed libraries which I found on the internet.  But most of the managed free PDF libraries you find online are

  1. not working,
  2. incomplete,
  3. in beta
  4. provide a very small subset of what you can do with PDF.
  5. Or simple do not support rendering a PDF to another digital format (like PNG or JPEG).
  6. Or are a wrapper around (some obscure) unmanaged DLL’s

The only way out would be to start a new managed library to read PDF files.  Which can’t be too hard, because it’s mostly PostScript, right?  But going down that path, took already several hours.  Precious time which we did not had.  It was Christmas holiday after all.

From the commercial pallet of products, I did tried Imazen’s Resizer.Net and it worked, but having 2 image processors was complete overkill, and was not playing along with ImageProcessor.  However that could have been me messing around with various dll’s.

Because I lost all hope at finding a complete managed (.net) solution, and because I already found several NuGet packages wrapping GhostScript, we decided to settle.  GhostScript is a pretty mature product after all.

The solution: using ImageProcessor and GhostScript(.net)

We choose GhostScript.Net to be our wrapper around GhostScript.  Mostly because the logo was very nice, certainly compared to the other wrappers available. 

Further we did a small POC proving it worked.  The code is pretty simple:

var rasterizer = new GhostscriptRasterizer();
rasterizer.Open(pdfUri, lastInstalledVersionOfGhostScript, false);
System.Drawing.Image img = _rasterizer.GetPage(_desiredXDpi, _desiredYDpi, pageNumber);
rasterizer.Close();

This very simple code did what we needed.  It converted the (first) page of a PDF to an image format.  It was very simple to create a .Net handler for this to create an image.  But we wanted to have a better dev experience.  And also be able to resize easier.

ImageProcessor to the rescue

The best advice from last year (2016) is: build on the shoulder of giants.  If you have not met James South, it’s a giant.  He is way over 2 meter big, and strong as an elephant. He knows to drink, like a real Scott, and moved to Australia a catch his own food with his bear hands.  Or not.  But he IS a craftsmen and builder of the marvelous ImageProcessor, used and distributed along with Umbraco.

So we wondered if Imageprocessor could support a different file format, like… PDF, we would have a solution.

It turns out that is not too hard.  Just create a new class inherited from FormatBase, override the appropriate members.  If you override the LOAD method of the base class, you could load a PDF, rasterize it, and pass the System.Drawing.Image object to ImageProcessor.  It will handle everything for you?

For example: /media/1234/myPublication.pdf?width=500

Let us serve other clients to!

We do have other clients uploading PDF’s from time to time.  I am pretty sure that we will be installing this plugin to avoid the next support request: “image not showing on website”.

The code is available on github: https://github.com/dampee/ImageProcessor.Plugins.Pdf/

A NuGet package is available too: https://www.nuget.org/packages/ImageProcessor.Plugins.PDF/

 

Let me know what you think of the package.

 

(*) this is fiction, they are actually very nice and polite.  It is only me, myself and I taking blame for the path we took.

 

 

 

 

Remove old records from Umbraco cmsPreviewXml table

by Damiaan Peeters 28. November 2015 01:24

I just got a production database.  This database was massive.  One of the big tables was the cmsPreviewXml. 

To clean up the Umbraco cmsPreviewXml run the SQL script below.  This will delete all older previews, but keep the most recent.

select *
--delete
from cmsPreviewXml
where versionId in (
    select cmsPreviewXml.versionId
    from cmsPreviewXml join cmsDocument on cmsPreviewXml.versionId=cmsDocument.versionId
    where cmsDocument.newest <> 1)

The other scripts I used to clean the umbraco database are in this gist: https://gist.github.com/dampee/a8ead728165b16d49c00

Remove all Cached folders from imagegen with powershell

by Damiaan Peeters 3. June 2015 22:08

UDPATE: read the comments for simple oneliners!

Sometimes you need to say goodbye to a good friend.  ImageGen served us very well.  But we made the switch to ImageProcessor some time ago.

The only thing, if you start to clean the media folder, are all the "Cached" folders.  To remove them we wrote a little powershell script to help us out.


$path = "C:\Projects\parentfoldertoclean"
cd $path

# delete Cached folders
get-childitem -Include Cached -Recurse -force | Remove-Item -Force –Recurse

# Delete any empty directories left behind after deleting the old files.
Get-ChildItem -Recurse -Force | Where-Object { $_.PSIsContainer -and (Get-ChildItem -Path $_.FullName -Recurse -Force | Where-Object { !$_.PSIsContainer }) -eq $null } | Remove-Item -Force -Recurse

Last umbraco login straight from sql

by Damiaan Peeters 25. November 2014 13:33

If you ever want to have the last login's from your back-end users this is usefull SQL it converts the stored ticks 

select userid , userlogin , useremail, DATEADD(ms, ((ticks - 599266080000000000) - 
   FLOOR((ticks - 599266080000000000) / 864000000000) * 864000000000) / 10000,
   DATEADD(d, (ticks - 599266080000000000) / 864000000000, '01/01/1900')) +
   GETDATE() - GETUTCDATE()
from (
SELECT l.[userID] as userid, u.userLogin as userlogin, u.useremail as useremail
      , min([timeout]) as ticks
  FROM [db16284].[dbo].[umbracoUserLogins] l
  inner join umbracouser u on l.userid = U.id
  where l.userid <> 0
  group by l.userid, u.userLogin, u.useremail
  ) t1
  order by userid desc

Umbraco Backoffice integration with Active Directory

by Damiaan Peeters 12. February 2014 10:43

Add the AD membership provider to the web.config in the system.web/membership/providers section

<add name="ADMembershipProvider" type="System.Web.Security.ActiveDirectoryMembershipProvider, System.Web, Version=2.0.0.0,Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a" 
connectionStringName="ADConnectionString"
attributeMapUsername="sAMAccountName" />

In the connectionstrings section of the web.config

<add name="ADConnectionString" connectionString="LDAP://server001/DC=commit,DC=local" />

Change in config/umbracoSettings.config

  <providers>
    <users>
      <!-- if you wish to use your own membershipprovider for authenticating to the umbraco back office -->
      <!-- specify it here (remember to add it to the web.config as well) -->
      <!--<DefaultBackofficeProvider>UsersMembershipProvider</DefaultBackofficeProvider>-->
      <DefaultBackofficeProvider>ADMembershipProvider</DefaultBackofficeProvider>
    </users>
  </providers>

Deleting all dictionary item from the umbraco dictionary

by Damiaan Peeters 21. December 2013 19:38

Sometimes you need to delete all dictionary items from the Umbraco dictionary. 

The problem

Deleting 100’s of dictionary items can be a PITA.  Imagine you have to right click for every dictionary item and select delete.

image

Side information

First you probably want to know which tables the dictionary is using.  There are 2: one for the items, one for the translations.  They call cmsDictionary and cmsLangaugeText. 

If you ever want to get information from the dictionary, just join these tables and you have what you need.

select d.pk, lt.pk, d.[key], lt.languageId, l.languageISOCode, lt.value
from cmsDictionary d inner join cmsLanguageText lt on d.id = lt.UniqueId
left join umbracoLanguage l on lt.languageId = l.id
where d.pk = 6

Solution

The fastest way to remove all umbraco dictionary items is through SQL.  To remove ALL dictionary items, just run this SQL script:

delete from cmsLanguageText
delete from cmsDictionary

Don’t forget to touch the web.config because dictionary items are heavily cached!

Did you know this is default in Umbraco?

by Damiaan Peeters 8. December 2013 15:18

A few days ago I got the question from a manager: “Is this default in Umbraco?” To be honest, that is a very difficult question that deserves a blog post on it’s own.  So here it is!  The answer to a general question asked to an umbraco developer: "Is this default in Umbraco?"

Short answer

It is probably a default setup on how you can work with Umbraco.  So: yes.

Slightly longer answer

It depends of course on what you call “default”.  But the developer you are asking this question, has probably implemented the things asked already a few times.  Been there, done that.  And if he hasn’t done so, he might have found a blog post or forum topic which explains how to do so.  Or maybe he has found (a free) package, which does the things considered to be default.

The reason why the developer can say it is default, is packages, blogs and extending umbraco IS default.  Without the default behaviour, you would not have a website. 

Long answer

First of all, I wrote earlier that Umbraco is not a CMS, I called it a framework.  Umbraco provides a base for you to build web applications (or websites), without trying to interfere with the developers using it.

Umbraco does not provide strict guidelines on how to setup your website.  It does not force information into predefined concepts.  As such the developer need to (sic, by default) extend the Umbraco download, in order to have a website.  The nice thing about Umbraco, is that it doesn’t limit you in how you organise everything to get things done.  It gives you a solid base where you can plug in different things.

The core of Umbraco exists out of an amount of default modules.  These modules take care of rendering your webpages, building URL’s, saving and publishing content items (nodes) and rendering backend.  These modules are “default”, but no-one would not expect otherwise from a CMS.  That's why you are not writing your own rendering engine, but instead choose a CMS. 

A lot of modules in the Umbraco Core can be replaced or extended by your own needs.  I consider replacing these modules, or adding extra functionality to modules ALSO default.  A developer familiar with the concerned topics (extending umbraco and dev’s reference), doesn’t need to know all the internals to add easy customizations to your website.  This is were a lot of power of umbraco comes from, and is to my knowledge not always easy to do with other cms’s. Extending the base is daily job, and can be considered default.

If the core is not enough, there are a lot of packages.  Packages are pieces of software which are developed by 3th parties (except for Courier, Contour and Concierge), which add extra functionality to Umbraco.  Most packages are free, some not.  Packages are build for Umbraco, and the Umbraco back-end makes it very easy to install addons.  I think we can safely say that most Umbraco installations use packages.  This makes me believe the Umbraco Packages are default.  If someone would try too argue against, consider that there are packages which are now included (or parts of it) into to core (like uGoLive, uComponents, …).

One of the great side effects of having such a Robust core is that a lot of package creators, implement the same robust principles in there software.  This makes it for packages developers easy to plug into events of 3th party packages, and “customize” this further.

If you read up till here, you might wonder: all this customization sounds like pretty expensive.  I can’t talk for other companies, or what they are charging for certain functionality.  In our experience, Umbraco offers a lot of extendable default functionality speeding up development.  This gives the customizations in question a much higher return on investment, compared to Umbraco-less solutions.

So what do you think?  What do you consider "default" in umbraco?

Smart homepage switching on HTTP accept-language headers

by Damiaan Peeters 1. December 2013 10:18

Http Headers

Every time time a browser requests a webpage from a web server, there are headers added to the the request.  Information like: Give me page X, I accept HTTP, you can Gzip or deflate data, proxy information, …

On of these http headers are the “Accept-Language” http headers.  So what does these headers mean?

Each language-range MAY be given an associated quality value which represents an estimate of the user's preference for the languages specified by that range.

If you open your browser hit F12, you can find for every request the headers back.  The accept language headers can look like this:

image

This means: I prefer Dutch, but will accept US English or any English dialect if available.  Every user can change the browser settings to reflect their language preference.

Sometimes – from a user design perspective, using the http headers to show understandable content to the user might be a good idea.  A lot of UX experts argue pro and/or contra.  We won’t go into the discussion because this is mainly a technical blog.  Let’s just try to implement it, because we can!  Glimlach

The solution

I’ll be using the IContentFinder for the solution. Never heard of the ContentFinderResolver?  Then read my previous blogpost or some official umbraco documentation.

I think that this is one of the cases where you can use IContentFinder to control what Umbraco serves to the rendering process.  We need to start with adding a new entry to the ContentFinderResolver.  This time we want it to be launched before the default umbraco implementation.  We can do this by using the following code:

ContentFinderResolver.Current.InsertTypeBefore<ContentFinderByNiceUrl, MyCustomHttpAcceptLanguageContentFinder>();

The rest of the idea is simple, for all root content nodes, search back the “Domain” object, and check whether the language matches.

try {
  string acceptLanguage = HttpContext.Current.Request.UserLanguages[0].Split('-')[0];
  string domainName = System.Web.HttpContext.Current.Request.Url.Host.ToLower();
  var rootDocs = UmbracoContext.Current.Application.Services.ContentService.GetRootContent();
  foreach (var rootDoc in rootDocs)
  {
    var domains = Domain.GetDomainsById(rootDoc.Id);

    var domainmatch = domains.FirstOrDefault(domain => domain.Language.CultureAlias.StartsWith(acceptLanguage));
    if (domainmatch != null)
    {
        contentRequest.PublishedContent = rootDoc;
        continue;
     }
  }
} catch {
  // search engines don't send language-accept headers
}

Attention, this is a very basic implementation and not ready for production at all! 

The SEO Warning

Like mentioned in the code: take care when implementing a solution using accept-language.  Googlebot is NOT sending any accept language headers along.  So be sure that you don’t get trapped in sending empty pages or (500) errors to the Search Engines.

With that I would like to ask, do you consider this as a valid use of IContentFinder?  What other Content Finders do you have in mind?  Have you already used the ContentFinderResolver?

From the legacy INotFoundHandler to IContentFinder.

by Damiaan Peeters 21. November 2013 09:05

If you are developing a custom “page not found” policy in Umbraco, then you know pretty good the “INotFoundHandler”.  

The old procedure is pretty straightforward. Create a new class which implements the INotFoundHandler interface and add an extra line in your “404handlers.config” file.  Done.

Did you know the INotFoundHandlers are replaced by IContentFinder?

Why bother

Are you really wondering why you would use the new interface “IContentFinder”, while the INotFoundHandler is still working.

First of all, the new IContentFinder  is documented.  How awesome is that? 

The INotFoundHandlers will become obsolete in the future.  So no reason to stay on legacy stuff. 

ContentFinders are very stable,  Umbraco v6 already uses ContentFinders to serve your content.  This is not instable alpha stuff you are looking at.

Notice the name change.  We go from “not found” to “content finder”.  That means that you can do a lot more than just handling not found requests.  That’s right!  You can now write your own blasting super geeky content finder which can serve any IPublishedContent (probably content from the tree). 

This new IContentFinder is a part of the request pipeline.  That means that IContentFinder classes can handle any request which is handled by Umbraco.  But this also means that you can insert your custom class before the normal Umbraco flow of searching elements by the “nice url”. 

A few examples:

  • If you don’t like the awesome 301 package UrlTracker by kipusoep, and you are considering building your own, you would want to use the new IContentFinder interface. 
  • If you are working with a multi site & multi language with a difficult 404 page setup, you just write your own 404 handler (and call the SetIs404() method on the PublishedContentRequest).
  • If you want to write your own rewriting rules against against some external database, you could use the IContentFinder
  • Serve content from a custom datasource.  I’ll try to discuss this briefly in another post.

How it works

First write your own ContentFinder.  You can do this by creating a new class, which implements the IContentFinder interface.  The only method you need is the TryFindContent.  Set the “PublishedContent” property to the node you want to returned to the user and return TRUE.  If your contentfinder did not found any content return FALSE so others can give there shot.

If you have a node you want to show as 404, put the node as the “publishedContent” property, call SetIs404  and return TRUE.

public class MyCustomContentFinder : IContentFinder 
{ 
    public bool TryFindContent(PublishedContentRequest contentRequest) 
    { 
        LogHelper.Debug<MyCustomContentFinder>("TryFindContent({0})", () => contentRequest.Uri.ToString());
if (contentRequest == null) { return contentRequest.PublishedContent != null; } var contentCache = UmbracoContext.Current.ContentCache; var foundContent = contentCache.GetById(1234); contentRequest.PublishedContent = foundContent; // contentRequest.SetIs404(); return contentRequest.PublishedContent != null; } }

To let Umbraco use the IContentFinder above, you will need to add the class to the ContentFinderResolver.  In this case we will insert it before the legacy “NotFoundHandlers”:

public class Custom404Launcher : ApplicationEventHandler
{
    protected override void ApplicationStarting(UmbracoApplicationBase umbracoApplication, ApplicationContext applicationContext)
    {
        LogHelper.Info<Custom404Launcher>("Attaching MyCustomContentFinder as IContentFinder");
        ContentFinderResolver.Current.InsertTypeBefore<ContentFinderByNotFoundHandlers, MyCustomContentFinder>();
    }
}

Watch (tail) the umbraco Log file with powershell

by Damiaan Peeters 6. November 2013 19:02

Situation

Sometimes you want to watch the umbraco Log file.  You doubelclick the file in explorer time after time.  Scroll down and look what was added on the bottom.  If you ar lazier (smarter?), then you would open WebMatrix or Visual studio leave the file open and just click yes when it reloads.

If you are still looking for the log files: go to /app_Data/Logs/UmbracoTraceLog.txt

Watching while it moves?

if you ever used linux then you are probably already missing the TAIL command for ages.  We have a solution: powershell to the rescue!  Make sure you have installed powershell version 3.

Then create a new powershell file (eg. mylogviewer.ps1) in the root of the website. Paste in the code below: 

gc App_Data\Logs\UmbracoTraceLog.txt -Tail 10 -Wait

When you run this powershell command, you will see the that powershell stays active and will update the screen as soon as new lines arrive in your LogFile.  If you don't want to create the file yourself, i have added it compressed below.

LogTail.zip (170.00 bytes)

Who.I.am

Certified Umbraco Master, Part of Umbraco Certified partner comm-it, .Net and Azure developer, seo lover. Magician in my spare time.

Month List