Wednesday I had a frustrating experience trying to help track down a problem in a post to
a blog I subscribe to. It ended happily enough when we were able to locate a syntax error in the HTML of the post, but not before we had explored several blind alleys.
Before discussing the implications, I will first present some (lightly edited) excerpts from our correspondence, to set the stage.
-----
Me: R&D Funding post. A link near the end seems to be broken.
Peter: I'm looking at the post and trying to find the broken link without success. They all seem to be working for me. Maybe you caught it as I was rebuilding the page? If not, can you give me more context for the link?
Me: It's the link associated with the text "de-emphasizing long-term, fundamental research", and it still takes me to
"http://www.cra.org/govaffairs/blog/archives/increasing", which still yields a "HTTP 404 - File not found" message.
Peter: Hmmm. I'm at a loss. I've had a few people on varied platforms test it without problem. Could it be that you're looking at a cached version?
The link that shows in my browser (and my test subjects) is:
http://www.cra.org/govaffairs/images/trends_in_dod_snt.jpg
A mystery....
Me: I can follow the link in your email just fine, but not from the post.
Must be a poisoned cache (perhaps caught while you were rebuilding). CTRL-F5 is SUPPOSED to force a refresh, but I've had trouble with this on our corporate network before. I'll try it from home and see if I get the same results.
Me: It's even more weird than I thought, and not simple cache poisoning.
I did a View Source on the page, while it was exhibiting the problem. In the resulting HTML, the link is exactly what you say it is. However, when I mouse over the link, I still get what I told you.
To investigate further, I saved the HTML to my local disk and repeated the experiment. Same results, except that now it looks for and fails to find a local file ("increasing" in the same directory as the main page). I cleared the browser cache and refreshed. Same results. Out of superstition, I quoted the URL in the link. Same result.
This is truly bizarre, and, as nearly as I can tell, affects only this one link.
Me: Just looked at a larger fragment of the HTML, and the problem is obvious (note the "href=increasing"). Is this corrupted relative to what you have published?
As others have <a href= http://www.aaas.org/spp/rd/upd1104.htm> noted</a>, the bulk of that 44% increase has gone to the Defense Department, which is <a href=increasing it's support for more short-term, development-oriented research and <a href= http://www.cra.org/govaffairs/images/trends_in_dod_snt.jpg> de-emphasizing long-term, fundamental research</a>. <a href=http://www.cra.org/govaffairs/defense.php>Here's more</a> on CRA's concerns about DOD research. </p>
Peter: Aha. You've found the problem!
I had intended to add a link to that "increasing it's support for more short-term, development-oriented research" phrase but changed my mind. Unfortunately, as you discovered, I left a fragment of the "<a href=" tag in the text. Apparently Safari and the browsers my "testers" used didn't stumble on the fragment -- they just ignored everything between the tag fragment and the correctly formatted "<a href>" tag that starts with "de-emphasizing." But that must not be the case with all browsers.
I didn't catch it looking at the rendered HTML because the sentence still made sense without the "increasing it's support..." phrase.
Anyway, seems to work now with the tag fragment deleted. Does it work for you?
-----
So what is the lesson in this? There was clearly a syntax error, so what we got is what we deserved, right? I think not.
Given the frequency of errors in HTML, it would be unreasonable for renderers to refuse to display pages with errors. (I only with great difficulty found the HTML bug in the first version of this post.) However, we stumbled around blindly because
none of the browsers we were using gave any hint that there was a syntax error on the page. Each just silently "corrected" the error. Unfortunately, but predictably, they didn't all "correct" it in the same way, meaning that Peter and his testers were getting one result, and I was getting another.
I contend that
all of the browsers were wrong not to indicate clearly the existence of a syntax error. A friendly browser would even have made some attempt to indicate the approximate location on the page of the error.
Although it was published more than thirty years ago, I think my advice on "What the Compiler Should Tell the User" (in
Compiler Construction, an Advanced Course, F. L. Bauer and J. Eickel (eds.), Springer-Verlag, pp. 525–548, 1974) is still pertinent to those who build compilers and other formal language interpreters. Those who do not study the past are very likely not to learn its lessons, and therefore to repeat old mistakes.
Labels: Amusing, Risks