PR(A)
is the PageRank of page A,
PR(Ti)
is the PageRank of pages Ti which link to page A,
C(Ti)
is the number of outbound links on page Ti and
d
is a damping factor which can be set between 0 and 1.
So, first of all, we see that PageRank does not rank web sites
as a whole, but is determined for each page individually. Further,
the PageRank of page A is recursively defined by the PageRanks of
those pages which link to page A.
The PageRank of pages Ti which link to page A does not influence
the PageRank of page A uniformly. Within the PageRank algorithm,
the PageRank of a page T is always weighted by the number of outbound
links C(T) on page T. This means that the more outbound links a
page T has, the less will page A benefit from a link to it on page
T.
The weighted PageRank of pages Ti is then added up. The outcome
of this is that an additional inbound link for page A will always
increase page A's PageRank.
Finally, the sum of the weighted PageRanks of all pages Ti is multiplied
with a damping factor d which can be set between 0 and 1. Thereby,
the extend of PageRank benefit for a page by another page linking
to it is reduced.
The Random Surfer Model
In their publications, Lawrence Page and Sergey Brin give a very
simple intuitive justification for the PageRank algorithm. They
consider PageRank as a model of user behaviour, where a surfer clicks
on links at random with no regard towards content.
The random surfer visits a web page with a certain probability
which derives from the page's PageRank. The probability that the
random surfer clicks on one link is solely given by the number of
links on that page. This is why one page's PageRank is not completely
passed on to a page it links to, but is devided by the number of
links on the page.
So, the probability for the random surfer reaching one page is
the sum of probabilities for the random surfer following links to
this page. Now, this probability is reduced by the damping factor
d. The justification within the Random Surfer Model, therefore,
is that the surfer does not click on an infinite number of links,
but gets bored sometimes and jumps to another page at random.
The probability for the random surfer not stopping to click on
links is given by the damping factor d, which is, depending on the
degree of probability therefore, set between 0 and 1. The higher
d is, the more likely will the random surfer keep clicking links.
Since the surfer jumps to another page at random after he stopped
clicking links, the probability therefore is implemented as a constant
(1-d) into the algorithm. Regardless of inbound links, the probability
for the random surfer jumping to a page is always (1-d), so a page
has always a minimum PageRank.
2.
The PageRank Algorithm (continued)
This article reproduced with permission of eFactory.
© 2002 eFactory Internet-Agentur KG Online-Marketing - written
by Markus Sobek
PageRank and Google are trademarks of Google Inc., Mountain ViewCA,
USA.
PageRank is protected by US Patent 6,285,999.
|