| algorithm. Lawrence
Page, for instance, describes a method for a special evaluation
of web pages in his PageRank patent specifications (United States
Patent 6,285,999). The starting point for his consideration is that
the random surfer of the Random Surfer Model may get bored and stop
following links with a constant probabilty, but when he restarts,
he won't take a random jump to any page of the web but will rather
jump to certain web pages with a higher probability than to others.
This behaviour is closer to the behaviour of a real user, who would
more likely use, for instance, directories like Yahoo or ODP as
a starting point for surfing.
If a special evaluation of certain web pages shall take place,
the original PageRank algorithm has to be modified. With another
expected value implemented, the algorithm is given as follows:
PR(A) = E(A) (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
Here, (1-d) is the probability for the random surfer no longer
following links. E(A) is the probability for the random surfer going
to page A after he has stopped following links, weighted by the
number of web pages. So, E is another expected value whose average
over all pages is 1. In this way, the average of the PageRank values
of all pages of the web will continue to converge to 1. Thus, the
PageRank values do not vaccilate because of the special evaluation
of web pages and the impact of PageRank on the general ranking of
web pages remains stable.
 |
 |
In our example, we set the probability
for the random surfer going to page A after he has stopped following
links to 0.1. The probability for him going to page B is set
to 0.9. Since our web consists of two pages E(A) equals 0.2
and E(B) equals 1.8. At a damping factor d of 0.5 we get the
following equations for the calculation of the single pages'
PageRank values: |
 |
 |
PR(A) = 0.2 × 0.5 + 0.5 × PR(B)
PR(B) = 1.8 × 0.5 + 0.5 × PR(A)
If we solve these equations we get the following PageRank values:
PR(A) = 11/15
PR(B) = 19/15
The sum of the PageRank values remains 2. The higher probability
for the random surfer jumping to page B is reflected by its higher
PageRank. Indeed, the uniform interlinking between both pages prevents
our example pages' PageRank values from a more significant impact
of our intervention.
So, it is possible to implement the special evaluation of certain
web pages into the PageRank algorithm without having to change it
fundamentally. It is questionable, indeed, what criteria is used
for the evaluation. Lawrence Page suggests explicitly the utilization
of real usage data in his PageRank patent specifications. Google,
meanwhile, collects usage date by means of the Google Toolbar. And
Google would not even need as much data, as if the whole ranking
was solely based on usage data. A limited sample would be sufficient
to determine the 1,000 or 10,000 most important pages on the web.
The PageRank algorithm can then fill the holes in usage data and
is thereby able to deliver a more accurate picture of the web.
Of course, all statements regarding the influence of real usage
data on PageRank are pure speculation. Even if there is a special
evaluation of certain web pages at all will in the end, stay a secret
of the people at Google.
8.
The Yahoo Bonus (continued)
This article reproduced with permission of eFactory.
© 2002 eFactory Internet-Agentur KG Online-Marketing - written
by Markus Sobek
PageRank and Google are trademarks of Google Inc., Mountain ViewCA,
USA.
PageRank is protected by US Patent 6,285,999.
|