From the legacy INotFoundHandler to IContentFinder.

by Damiaan Peeters 21. November 2013 09:05

If you are developing a custom “page not found” policy in Umbraco, then you know pretty good the “INotFoundHandler”.  

The old procedure is pretty straightforward. Create a new class which implements the INotFoundHandler interface and add an extra line in your “404handlers.config” file.  Done.

Did you know the INotFoundHandlers are replaced by IContentFinder?

Why bother

Are you really wondering why you would use the new interface “IContentFinder”, while the INotFoundHandler is still working.

First of all, the new IContentFinder  is documented.  How awesome is that? 

The INotFoundHandlers will become obsolete in the future.  So no reason to stay on legacy stuff. 

ContentFinders are very stable,  Umbraco v6 already uses ContentFinders to serve your content.  This is not instable alpha stuff you are looking at.

Notice the name change.  We go from “not found” to “content finder”.  That means that you can do a lot more than just handling not found requests.  That’s right!  You can now write your own blasting super geeky content finder which can serve any IPublishedContent (probably content from the tree). 

This new IContentFinder is a part of the request pipeline.  That means that IContentFinder classes can handle any request which is handled by Umbraco.  But this also means that you can insert your custom class before the normal Umbraco flow of searching elements by the “nice url”. 

A few examples:

  • If you don’t like the awesome 301 package UrlTracker by kipusoep, and you are considering building your own, you would want to use the new IContentFinder interface. 
  • If you are working with a multi site & multi language with a difficult 404 page setup, you just write your own 404 handler (and call the SetIs404() method on the PublishedContentRequest).
  • If you want to write your own rewriting rules against against some external database, you could use the IContentFinder
  • Serve content from a custom datasource.  I’ll try to discuss this briefly in another post.

How it works

First write your own ContentFinder.  You can do this by creating a new class, which implements the IContentFinder interface.  The only method you need is the TryFindContent.  Set the “PublishedContent” property to the node you want to returned to the user and return TRUE.  If your contentfinder did not found any content return FALSE so others can give there shot.

If you have a node you want to show as 404, put the node as the “publishedContent” property, call SetIs404  and return TRUE.

public class MyCustomContentFinder : IContentFinder 
{ 
    public bool TryFindContent(PublishedContentRequest contentRequest) 
    { 
        LogHelper.Debug<MyCustomContentFinder>("TryFindContent({0})", () => contentRequest.Uri.ToString());
if (contentRequest == null) { return contentRequest.PublishedContent != null; } var contentCache = UmbracoContext.Current.ContentCache; var foundContent = contentCache.GetById(1234); contentRequest.PublishedContent = foundContent; // contentRequest.SetIs404(); return contentRequest.PublishedContent != null; } }

To let Umbraco use the IContentFinder above, you will need to add the class to the ContentFinderResolver.  In this case we will insert it before the legacy “NotFoundHandlers”:

public class Custom404Launcher : ApplicationEventHandler
{
    protected override void ApplicationStarting(UmbracoApplicationBase umbracoApplication, ApplicationContext applicationContext)
    {
        LogHelper.Info<Custom404Launcher>("Attaching MyCustomContentFinder as IContentFinder");
        ContentFinderResolver.Current.InsertTypeBefore<ContentFinderByNotFoundHandlers, MyCustomContentFinder>();
    }
}

Bing indexing obsolete page of Post.Be

by Damiaan Peeters 11. August 2009 12:27

De national Post service of Belgium, has changed it’s site and Bing managed to indexed the obsolete page…  Look at the URL: www.post.be/site/nl/obsolete.html.  I was curious and tried to find out why Bing gave me the wrong result.

bing search depost.be

Ok, i admit.  It was not the best search i have done.  Searching for post, would give the good result. 

Anyway, this is a screenshot of the web page:

"page unavailable" Post.be

See the text at the top?  It’s Dutch and French.  Translated into English it says: “This page is not available anymore, click a link below to visit our new site”.   I didn’t got it.  If the page does not exists anymore, was Bing wrong indexing it? 
No!  Bing got it right (sort of).  It was in fact the webmaster of Post.be who forgot something.

Explanation

The page you are entering is for a normal visitor a standard NOT FOUND page.  When visiting a page which doesn’t exists, the server should give a 404 status in its response.  The HTML can be customized like the screenshot above for an improved user experience.  But crawlers & indexers need the correct status.  In this case a 404 error.

When you investigate all requests send to the web server, every requests gets a 200 Status code: “Successful”.  I used Fiddler to verify the page request:

fiddler result of the URL

As anyone can find on the web, it tells the visitor that the request was successful.  Not a single 404 status code can be found in the second column.

A user centric solution

For a better user experience, it would have been nice of “De Post” to redirect me automatically to the new homepage.  The page I was visiting here was obviously an obsolete page.  This can be done in HTML using a Meta Refresh Tag.  Looking at the HTML source, no redirect is made.

HTML source of the web page

It would have been more user friendly, but not the best solution.

What about 301

A 404 would have been a good start.  Bing, Google and other search engines would not mention the result in their SERPs for long. I guess de webmaster of “De Post” must have had some reason not to use a 301.  A 301 is a status code telling the visitor (human or not) that the page has moved permanently to a new place.  Together with this status code the new URL is supplied.

What exactly happens when a 301 is send from the web server back to the visitor?

A human visitor

The browser (Internet Explorer, Firefox, …) of the visitor will notice the 301 status code and load the new page supplied with the 301 status code.  The end user might notice a new page is loaded because the shown URL in the browser will be updated.

A Bot / Spider

A spider or a bot will notice the 301 also, and will remove the old URL from their index and update it with the new URL.  Using a 301 has major advantages when migrating to a new site because you can retain all page rank from any incoming links.

Who.I.am

Certified Umbraco Master, Part of Umbraco Certified partner comm-it, .Net and Azure developer, seo lover. Magician in my spare time.

Month List