The PageRank files can
be requested directly from the domain www.google.com. Basically,
the URLs for those files look like follows (without line breaks):
http://www.google.com/search?
client=navclient-auto&
ch=0123456789&
features=Rank&
q=info:http://www.domain.com/
There is only one line of text in the PageRank files. The last
cipher in this line is PageRank.
The parameters incorporated in the above shown URL are inevitable
for the display of the PageRank files in a browser. The value "navclient-auto"
for the parameter "client" identifies the Toolbar. Via
the parameter "q" the URL is submitted. The value "Rank"
for the parameter "features" determines that the PageRank
files are requested. If it is omitted, Google's servers still transmit
XML files. The parameter "ch" transfers a checksum for
the URL to Google, whereby this checksum can only change when the
Toolbar version is updated by Google.
Thus, it is necessary to install the Toolbar at least once to find
out about the checksum of one's URLs. To track the communication
between the Toolbar and Google, often the use of packet sniffers,
local proxies an similar tools is suggested. But this is not necessarily
needed, since the PageRank files are cached by the Internet Explorer.
So, the checksums can simply been found out by having a look at
the folder Temporary Internet Files. Knowing the checksums of your
URLs, you can view the PageRank files in your browser and you do
not have to accept Google's 36 years lasting cookies.
Since the PageRank files are kept in the browser cache and, thus,
are clearly visible, and as long as requests are not automated,
watching the PageRank files in a browser should not be a violation
of Google's Terms of Service. However, you should be cautious. The
Toolbar submits its own User-Agent to Google. It is:
Mozilla/4.0 (compatible; GoogleToolbar 1.1.60-deleon; OS SE 4.10)
1.1.60-deleon is a Toolbar version which may of course change.
OS is the operating system that you have installed. So, Google is
able to identify requests by browsers, if they do not go out via
a proxy and if the User-Agent is not modified accordingly.
Taking a look at IE's cache, one will normally notice that the
PageRank files are not requested from the domain www.google.com
but from IP addresses like 216.239.33.102. Additionally, the PageRank
files' URLs often contain a parameter "failedip" that
is set to values like "216.239.35.102;1111" (Its function
is not absolutely clear). The IP addresses are each related to one
of Google's seven data centers and the reason for the Toolbar querying
IP-addresses is most likely to control the PageRank display in a
better way, especially in times of the "Google Dance".
3. The Implementation of PageRank (continued)
This article reproduced with permission of eFactory.
© 2002 eFactory Internet-Agentur KG Online-Marketing - written
by Markus Sobek
PageRank and Google are trademarks of Google Inc., Mountain ViewCA,
USA.
PageRank is protected by US Patent 6,285,999.
|