 |
 |
The effect of dangling links
shall be illustrated by a small example website. We take a look
at a site consisting of three pages A, B and C. In our example,
the pages A and B link to each other. Additionally, page A links
to page C. Page C itself has no outbound links to other pages.
At a damping factor of 0.75, we get the following equations
for the single pages' PageRank values: |
 |
 |
PR(A) = 0.25 + 0.75 PR(B)
PR(B) = 0.25 + 0.375 PR(A)
PR(C) = 0.25 + 0.375 PR(A)
Solving the equations gives us the following PageRank values:
PR(A) = 14/23
PR(B) = 11/23
PR(C) = 11/23
So, the accumulated PageRank of all three pages is 36/23 which
is just over half the value that we could have expected if page
A had links to one of the other pages. According to Page and Brin,
the number of dangling links in Google's index is fairly high. A
reason therefore is that many linked pages are not indexed by Google,
for example because indexing is disallowed by a robots.txt file.
Additionally, Google meanwhile indexes several file types and not
HTML only. PDF or Word files do not really have outbound links and,
hence, dangling links could have major impacts on PageRank.
 |
 |
In order to prevent PageRank
from the negative effects of dangling links, pages wihout outbound
links have to be removed from the database until the PageRank
values are computed. According to Page and Brin, the number
of outbound links on pages with dangling links is thereby |
 |
 |
normalised. As shown in our illustration, removing one page can cause
new dangling links and, hence, removing pages has to be an iterative
process. After the PageRank calculation is finished, PageRank can
be assigned to the formerly removed pages based on the PageRank algorithm.
Therefore, as many iterations are needed as for removing the pages.
Regarding our illustration, page C could be processed before page
B. At that point, page B has no PageRank yet and, so, page C will
not receive any either. Then, page B receives PageRank from page A
and during the second iteration, also page C gets its PageRank.
Regarding our example website for dangling links, removing page
C from the database results in page A and B each having a PageRank
of 1. After the calculations, page C is assigned a PageRank of 0.25
+ 0.375 PR(A) = 0.625. So, the accumulated PageRank does not equal
the number of pages, but at least all pages which have outbound
links are not harmed from the danging links problem.
By removing dangling links from the database, they do not have
any negative effects on the PageRank of the rest of the web. Since
PDF files are dangling links, links to PDF files do not diminish
the PageRank of the linking page or site. So, PDF files can be a
good means of search engine optimisation for Google.
Next
Article Segment
6.
The Effect of The Number of Pages
This article reproduced with permission of eFactory.
© 2002 eFactory Internet-Agentur KG Online-Marketing - written
by Markus Sobek
PageRank and Google are trademarks of Google Inc., Mountain ViewCA,
USA.
PageRank is protected by US Patent 6,285,999.
|