where N is the total
number of all pages on the web. The second version of the algorithm,
indeed, does not differ fundamentally from the first one. Regarding
the Random Surfer Model, the second version's PageRank of a page
is the actual probability for a surfer reaching that page after
clicking on many links. The PageRanks then form a probability distribution
over web pages, so the sum of all pages' PageRanks will be one.
Contrary, in the first version of the algorithm the probability
for the random surfer reaching a page is weighted by the total number
of web pages. So, in this version PageRank is an expected value
for the random surfer visiting a page, when he restarts this procedure
as often as the web has pages. If the web had 100 pages and a page
had a PageRank value of 2, the random surfer would reach that page
in an average twice if he restarts 100 times.
As mentioned above, the two versions of the algorithm do not differ
fundamentally from each other. A PageRank which has been calculated
by using the second version of the algorithm has to be multiplied
by the total number of web pages to get the according PageRank that
would have been caculated by using the first version. Even Page
and Brin mixed up the two algorithm versions in their most popular
paper "The Anatomy of a Large-Scale Hypertextual Web Search
Engine", where they claim the first version of the algorithm
to form a probability distribution over web pages with the sum of
all pages' PageRanks being one.
In the following, we will use the first version of the algorithm.
The reason is that PageRank calculations by means of this algorithm
are easier to compute, because we can disregard the total number
of web pages.
The Characteristics of PageRank
The characteristics of PageRank shall be illustrated by a small
example.
 |
 |
We regard a small web consisting
of three pages A, B and C, whereby page A links to the pages
B and C, page B links to page C and page C links to page A.
According to Page and Brin, the damping factor d is usually
set to 0.85, but to keep the calculation simple we set it to
0.5. The exact value of the damping factor d admittedly has
effects on PageRank, |
 |
 |
but it does not influence the fundamental principles of PageRank.
So, we get the following equations for the PageRank calculation:
PR(A) = 0.5 + 0.5 PR(C)
PR(B) = 0.5 + 0.5 (PR(A) / 2)
PR(C) = 0.5 + 0.5 (PR(A) / 2 + PR(B))
These equations can easily be solved. We get the following PageRank
values for the single pages:
PR(A) = 14/13 = 1.07692308
PR(B) = 10/13 = 0.76923077
PR(C) = 15/13 = 1.15384615
It is obvious that the sum of all pages' PageRanks is 3 and thus
equals the total number of web pages. As shown above this is not
a specific result for our simple example.
For our simple three-page example it is easy to solve the according
equation system to determine PageRank values. In practice, the web
consists of billions of documents and it is not possible to find
a solution by inspection.
2.
The PageRank Algorithm (continued)
This article reproduced with permission of eFactory.
© 2002 eFactory Internet-Agentur KG Online-Marketing - written
by Markus Sobek
PageRank and Google are trademarks of Google Inc., Mountain ViewCA,
USA.
PageRank is protected by US Patent 6,285,999.
|