Bing indexing obsolete page of Post.Be

by Damiaan Peeters 11. August 2009 12:27

De national Post service of Belgium, has changed it’s site and Bing managed to indexed the obsolete page…  Look at the URL: www.post.be/site/nl/obsolete.html.  I was curious and tried to find out why Bing gave me the wrong result.

bing search depost.be

Ok, i admit.  It was not the best search i have done.  Searching for post, would give the good result. 

Anyway, this is a screenshot of the web page:

"page unavailable" Post.be

See the text at the top?  It’s Dutch and French.  Translated into English it says: “This page is not available anymore, click a link below to visit our new site”.   I didn’t got it.  If the page does not exists anymore, was Bing wrong indexing it? 
No!  Bing got it right (sort of).  It was in fact the webmaster of Post.be who forgot something.

Explanation

The page you are entering is for a normal visitor a standard NOT FOUND page.  When visiting a page which doesn’t exists, the server should give a 404 status in its response.  The HTML can be customized like the screenshot above for an improved user experience.  But crawlers & indexers need the correct status.  In this case a 404 error.

When you investigate all requests send to the web server, every requests gets a 200 Status code: “Successful”.  I used Fiddler to verify the page request:

fiddler result of the URL

As anyone can find on the web, it tells the visitor that the request was successful.  Not a single 404 status code can be found in the second column.

A user centric solution

For a better user experience, it would have been nice of “De Post” to redirect me automatically to the new homepage.  The page I was visiting here was obviously an obsolete page.  This can be done in HTML using a Meta Refresh Tag.  Looking at the HTML source, no redirect is made.

HTML source of the web page

It would have been more user friendly, but not the best solution.

What about 301

A 404 would have been a good start.  Bing, Google and other search engines would not mention the result in their SERPs for long. I guess de webmaster of “De Post” must have had some reason not to use a 301.  A 301 is a status code telling the visitor (human or not) that the page has moved permanently to a new place.  Together with this status code the new URL is supplied.

What exactly happens when a 301 is send from the web server back to the visitor?

A human visitor

The browser (Internet Explorer, Firefox, …) of the visitor will notice the 301 status code and load the new page supplied with the 301 status code.  The end user might notice a new page is loaded because the shown URL in the browser will be updated.

A Bot / Spider

A spider or a bot will notice the 301 also, and will remove the old URL from their index and update it with the new URL.  Using a 301 has major advantages when migrating to a new site because you can retain all page rank from any incoming links.

Who.I.am

Certified Umbraco Master, Part of Umbraco Certified partner comm-it, .Net and Azure developer, seo lover. Magician in my spare time.

Month List