White House website and the Robots.txt file

Does the White House have some­thing to hide?

Sen­a­tor Hiram John­son href="http://www.bartleby.com/73/1925.html" style="text-decoration:underline">famously quipped that
"the first casu­alty when war comes is the truth." As the war in Iraq
con­tin­ues, is the White House inten­tion­ally pre­vent­ing search engines
from pre­serv­ing a record of its state­ments on the con­flict with an egre­gious ROBOTS.TXT file? Or, did
their staff sim­ply make a tech­ni­cal mistake?

When search engines "spi­der" the web in search of doc­u­ments for their
indices, web site own­ers some­times put a file called href="http://www.robotstxt.org/wc/robots.html">robots.txt which
instructs the "spi­ders" not to index cer­tain files. This can be for
pol­icy rea­sons, if an author does not want his or her pages to appear
in search list­ings, or it can be for tech­ni­cal rea­sons, for exam­ple if
a web site is dynam­i­cally gen­er­ated and can not or should not be
down­loaded in its entirety.

Accord­ing to href="http://yro.slashdot.org/article.pl?sid=03/10/27/2052228&mode=nested&tid=103&tid=126&tid=95&tid=99&threshold=2">reports,
though, the White House is request­ing that search engines href="http://www.bway.net/~keith/whrobots/">not index cer­tain pages related to Iraq. In addi­tion to stop­ping searches, this pre­vents archives like Google's href="http://www.google.com/help/features.html#cached">cache and
the Inter­net Archive from
stor­ing copies of pages that may later change.2600
called the White House to inves­ti­gate the matter.

Accord­ing to White House spokesman Jimmy Orr, the block­ing of search engines is not an attempt to ensure future revi­sions
will remain unde­tected. Rather, he explained, they "have an Iraq
sec­tion [of the web­site] with a dif­fer­ent tem­plate than the main
site." Thus, for exam­ple, a press release on a meet­ing between
Pres­i­dent Bush and "Spe­cial Envoy" Bre­mer is avail­able in href="http://www.whitehouse.gov/news/releases/2003/10/iraq/20031027 – 1.html">the
Iraq template (blocked from being indexed by search engines) or href="http://www.whitehouse.gov/news/releases/2003/10/20031027 – 1.html">the
nor­mal White House template (avail­able for index­ing by search
engines). The attempt, Mr. Orr said, was that when peo­ple search,
they should not get mul­ti­ple copies of the same infor­ma­tion. Most of
the "sus­pi­cious" entries in the robots.txt file do, indeed, appear to
have only this effect.

Accord­ing to the robots.txt of href="http://www.bway.net/~keith/whrobots/robotsWHcurrent10-24 – 03.txt">October
24, though, the href="http://www.whitehouse.gov/infocus/iraq/">In Focus: Iraq
sec­tion of the site was blocked from search engines. Some of the href="http://www.whitehouse.gov/infocus/iraq/kay-20031008.html">information
there does not href="http://www.whitehouse.gov/query.html?col=colpics&qt=%2B%22david+kay%22+%2B%22interim+progress%22&submit.x=0&submit.y=0">appear
to be avail­able any­where else on the White House site. How­ever, it
seems that, in response to inquiries from2600 and other
sources, the White House web team has recently changed their href="http://www.whitehouse.gov/robots.txt">robots.txt so that
these files are no longer blocked. (The cur­rent href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.29">Last-Modified
date on the robots.txt is 23:22 GMT,
Octo­ber 27th, after work on this arti­cle had already begun.)

It is of course open to spec­u­la­tion as to whether the orig­i­nal
block­ing of the con­tent in ques­tion was mali­cious or an hon­est
mis­take. Cer­tainly any­one who main­tains a large web­site has made some
sort of tech­ni­cal mis­take at least once, and the prompt­ness with which
the error was fixed after it was pointed out sug­gests that the White
House had no inter­est in keep­ing it in place. The White House, as an
entity respon­si­ble to the cit­i­zenry and an entity that has gen­er­ated a
lot of crit­i­cism over its han­dling of the sit­u­a­tion in Iraq, ought to
take spe­cial care to avoid sim­i­lar mis­takes in the
future. Nonethe­less, we are pleased to learn that, at least this time,
the issue seems to have been resolved promptly.

2 comments
  1. jacob phillips says: Nov 09, 200311:33 pm

    you may also enjoy read­ing this:
    http://condi.topcities.com/whrobots/index.html

  2. taylo says: May 10, 20053:22 am

    I have a prob­lems with this error when try­ing to go to win­dows updated site Error Ox80072EEZ .

Submit comment