Karakas Online

5.15.2. Cool labels don't change!

There is one thing to keep in mind when regrouping: chapter and section labels - DON'T change existing ones! You may change their position, but please not the label. Subsection and subsubsection labels can be changed without problem.

Why? Because chapters and sections will become separate HTML documents. There is a DSSSL stylesheet setting which controls how deep a level will still produce a separate HTML document (see the decription of chunk-section-depth in Section 7.1.5). The name of the documents will be the label of the chapter and section respectively. This is a behaviour we explicitly set in the DSSSL stylesheets (see Section 7.1.5) through the use-id-as-filename DSSSL parameter..

Obviously, you can move a section around, without affecting the HTML name of the resulting file, if you don't change its label. You can of course change its contents, put it somewhere else as a section of a different chapter etc., but you should leave the label untouched.

The problem is that, if the document is already on the Web and is receiving a lot of visits, most of them (experience suggests a number around two-thirds of the total number of visits) will be from search engines. If you change the section label, the HTML name changes. Consequently, the link from the search engines is no longer valid. The same is true for private bookmarks, or public bookmark lists.

But there is more to it: it's not only a matter of waiting 2-3 weeks for the new HTML document that contains the old (a bit reorganized) content to be indexed by the search engines. It's that the old name might have been at page 1 of the SERPS (Search Engine Result Pages) of some search engine for some keyword, because a lot of other people linked to it. Now, with a changed label and, consequently, a new HTML file name, those links do not reference the new document, and it gets a ranking close to "nowhere" (because search engines, notably Google, take links to a document to mean "votes" for that document and rank that document accordingly). The result: nobody finds it. frown

Note It's not a question of sacrificing quality!
 

I am not trying to tie up your hands in favour of a higher search engine ranking here! This discussion is not one of quality vs. ranking, but one of consistency. All I am advocating is: keep your labels (and consequently your filenames) consistent between various releases of your document! Once you have chosen a label for a chapter of section, stick with it.

Please also note that we are not talking about the title of a chapter or section, but its label. "Label" is LyXese for the "SGML id". You get a label from the "Insert -> Label" menu of LyX. Don't confuse title and label in this discussion!

For example, suppose a chapter with the title “Blue widgets” is at around place 6 out of 2,5 million (!) for "blue widgets" on Google. If you change the label of the chapter from "blue-widgets" to something else, then the original URI will disappear and you loose readers. Of course, changing the title also affects the SERPS (Search Engine Result Pages), but not as drastically as to eliminate the document altogether. However, Google likes a title that is correlated to the file name, so a title "Blue widgets" and a label "blue-widgets" are optimal from the SEO (Search Engine Optimization) point of view (see Section 5.15.1).

“But you are talking me into subjecting my writing to the whims of a search engine!”, you might counter. Nothing more far away than that! The point is not to restrict your writing. The point is: you write whatever you like, however you like and structure it as you please. There are rules for good writing that you might choose to observe, or not. There are also rules for good “copy” , from the point of view of keywords, search engines and ranking - which you are also free to observe or defy.

Then, at some point, you decide to put the document on the web. People will come, read it and, hopefully, find it good. Those people may like your document so much, as to go into the trouble to say something like "there's a cool document on blue widgets in this link here" - and link to it. Hundreds of people may do this perhaps - even thousands. Imagine the effort!

Now you come up with a new restructuring of you document - fine! You change the content - also fine! Then you change the label from:

<sect1 id="blue-widgets"><title>Blue widgets</title>

to:

<sect1 id="blue-widgets-2"><title>Blue widgets revisited</title>

In LyX, this is equivalent to changing the title from “Blue widgets” to “Blue widgets revisited” and the label from “blue-widgets” to “blue-widgets-2”. Perhaps you thought it would be a nice idea to change the title to reflect the reorganization. This will affect your ranking too, but then, almost everything that you write will affect it, so we will not discuss it here. (Actually, it will affect it a little more because it's on the title - but that's again not the point.).

But by changing that label from “blue-widgets” to “blue-widgets-2” you just managed to throw your document from place 6 to place 600 (or 6000, or...) in the SERPS. You just killed all the efforts of thousands of people that linked to your document!

Why?

Because labels become filenames in the document process from SGML to HTML (see Chapter 7 for a detailed explanation of this process). The document that would be blue-widgets.html now is blue-widgets-2.html. The original blue-widgets.html is nowhere to be found in your domain - hundreds, or even thousands of links on the Web now point to vacuum!

Google - and every other search engine - sees this and takes the old URL out of the index. Of course, it indexes the new one. But the new one does not have any links pointing to it - not yet. And perhaps people will not be willing to go into the trouble of changing all their documents, just because you wanted to keep your freedom of choosing (and changing!) the label (and the resulting HTML filename) at your whim. Thus, noone points to the new "reorganized" document. It is rated very low and appears at place...uhmm 1 million something, out of 2,5 million results for "blue widgets", where nobody will find it and nobody will read it. Remember, the original document ranked at place 6 out of 2,5 million!

You might think that, since the label-to-filename connection exists only for the “chunked” version (the version where openjade is instructed to split the document into separate HTML files, one per chapter or section, the so-called “chunks”, as explained in Section 7.1.4.6), the “unchunked” document will save you from this disaster. You are correct, the "single chunk documents" (single, big HTML file, TXT, PDF or PS versions) will not be affected .

If you only make the big HTML file, or the TXT, PDF and PS versions of your document available on the web, then you are not affected. But if you also made the chunked HTML version available at some point, the search engines will prefer to return results from this version, than from the others.

There are various reasons for this, one of them being that search engines don't read a document that is too long till the end and will thus index small chunks much better than huge textst. Another reason is that you need more links to a PDF document, to force a search engine to consider it important for indexing.

So forget about the huge, one-chunk docs as a search engine strategy. If you want to be found by the SEs, you must rely on the chunked versions - and perhaps a little on PDF, but only a little.

However, my point goes even further: we are not talking about a user who is searching for a unique, multiple keyword phrase that identifies the content of your reorganized document. We are talking about a user who just searches for, two keywords: “blue widgets”. If you change the label, you change the filename of the chunked version. If you do so, the search engine will NOT think "Ahh...the file blue-widgets.html is not there, let's present the huge document that contains all chapters, including the one on blue widgets - at the same ranking place"! There are three resons that this will not happen - and you should not rely on it:

  1. First, the search engine does not know that blue-widgets.html is just a chunk of some "whole" document, book1.html. There is nothing that a search engine does to find this out - not with today's technology. The two documents are different from the search engine point of view.

  2. Second, the big one, book1.html, contains much more text, therefore the importance of the "blue widgets" chapter is "diluted" from the surrounding, irrelevant text (irrelevant to what the user is searching with those keywords, namely "blue widgets"). This has to do with “ keyword density”, titles, structure and other “on-page” factors that the search engine calculates and takes into account for each page. Therefore, the document will rank at a place that is way back - invisible to all but the most determined searchers, practically dead.

  3. Third, if you are a HOWTO author, you may put your document on The Linux Documentation Project, which is a great place with good exposure to the Web, but that alone does not guarantee good ranking. What is also important, is that people link to it. But if you change an existing label, thus changing the filename of the chunked version (which is the most important one from the search engine point of view for the reasons stated above), then you kill all the links to the previous URL. You destroy what you were able to gather up to that point in terms of search engine visibility. You start anew.

Tip Use permanent redirects, if you do choose to change the label!
 

Let me put a preemtive disclaimer here: I know that you can put a "HTML permanent redirect" in your .htaccess file to indicate that the resource is now somewhere else, under a different name. But this makes URL management difficult for a webmaster. How on earth shall the webmaster know which labels some author, whose document he hosts on his website, changed in his last reorganization? Is he supposed to do nothing else a whole day, other than chasing diff outputs and editing .htaccess files? Just because the author wants to keep his freedom of changing labels at his whim? Remember, if it is a free document, it will find its way to other people's websites. People may link to files of those websites containing your document, even more often than they do to yours. When you release a new version with changed labels (and accordingly, filenames), do you send a list of changed labels to all those webmasters who host it, with the request to update their .htaccess files?

Certainly not.

Nevertheless, if you are the author and have decided to change the label of some chapter or section, don't let your web server send an "Error 404: not found" for the old URI. Let it send a “permanent redirect” instead. See Managing URIs and the links therein, for the preferred ways to handle this situation.

But again, a redirect has to be in your .htaccess file (or web conf file) until the last request has been seen in your web logs for the old URI (theoretically, at least, otherwise you loose the visitor). How long is this? One year? Ten years? How big does your .htaccess become if an author starts "reorganizing" his labels(!) every other week? How much of a performance penalty will you have to pay for your web server having to read and process huge .htaccess files on every page request?

Thus, every redirect will hurt you, either in terms of visibility in the SERPS, or in terms of complexity, or both. But if you change a label, don't forget the permanent redirect.

For the above reasons, most of the time, you will not feel the need to change labels while reorganizing, but think doubly about it if you must. Your best bet is to choose a label wisely (for the same reasons that you would Choose URIs wisely), perhaps with a name that is a bit more general than you might wish, but will still fit if you choose to change content, or even title, later on.

Cool URIs don't change. Cool labels don't change either. cool

Last updated Mon Sep 24 01:19:25 CEST 2007 Permalink: http://www.karakas-online.de/mySGML/cool-labels-dont-change.html All contents © 2002-2007 Chris Karakas