| does not determine
the general importance of a web page but rather measures its negative
characteristics. For the sake of simplicity this technique shall
be called "BadRank".
 |
 |
BadRank is in priciple based
on "linking to bad neighbourhoods". If one page links
to another page with a high BadRank, the first page gets a high
BadRank itself through this link. The similarities to PageRank
are |
 |
 |
obvious. The difference is that BadRank is not based on the evaluation
of inbound links of a web page but on its outbound links. In this
sense, BadRank represents a reversion of PageRank. In a direct adaptation
of the PageRank algorithm, BadRank would be given by the following
formula:
BR(A) = E(A) (1-d) + d (BR(T1)/C(T1) + ... + BR(Tn)/C(Tn))
where
BR(A) is
the BadRank of page A,
BR(Ti) is
the BadRank of pages Ti which are outbound links of page A,
C(Ti) is
here the number of inbound links of page Ti and
d is the
again necessary damping factor.
In the previously discussed modifications of the PageRank algorithm,
E(A) represented the special evaluation of certain web pages. Regarding
the BadRank algorithm, this value reflects if a page was detected
by a spam filter or not. Without the value E(A), the BadRank algorithm
would be useless because it was nothing but another analysis of
link structures which would not take any further criteria into account.
By means of the BadRank algorithm, first of all, spam pages can
be evaluated. A filter assigns a numeric value E(A) to them, which
can, for example, be based on the degree of spamming or maybe even
better on their PageRank. Thereby, again, the sum of all E(A) has
to equal the total number of web pages. In the course of an iterative
computation, BadRank is not only transfered to pages which link
to spam pages. In fact, BadRank is able to identify regions of the
web where spam tends to occur relatively often, just as PageRank
identifies regions of the web which are of general importance.
 |
 |
Of course, BadRank and PageRank
have significant differences, especially, because of using outbound
and inbound links, respectively. Our example shows a simple,
hierarchically structured website that reflects common link
structures pretty well. Each page links to every page which
is on a higher hierachical level and on its branch of the website's
tree structure. Each page links to pages which are arranged
hierarchically directly below them and, additionally, pages
on the same branch and the same |
 |
 |
hierarchical level link to each other. The following table shows the
distribution of inbound and outbound links for the hierarchical levels
of such a site.
 |
Level
0
1
2 |
inbound Links
6
4
2 |
outbound Links
2
4
3 |
 |
 |
 |
As to be expected, regarding inbound links, a hierarchical gradation
from the index page downwards takes place. In contrast, we find
the highest number of outbound links on the website's mid-level.
We can see similar results, when we add another level of pages to
our website while the above described linking rules stay the same.
 |
Level
0
1
2
3 |
inbound Links
1
8
4
2 |
outbound Links
2
4
5
4 |
 |
 |
 |
Again, there is a concentration of outbound links on the website's
mid-level. But most of all, the outbound links are much more evenly
distributed than the inbound links.
If we assign a value of 100 to the index page's E(A) in our original
example, while all other values E equal 1 and if the damping factor
d is 0.85, we get the following BadRank values:
 |
Page
A
B/C
D/E/F/G |
BadRank
22.39
17.39
12.21 |
|
 |
 |
 |
First of all, we see that the BadRank distributes from the index
page among all other pages of the website. The combination of PageRank
and BadRank will be discussed in detail below, but, no matter how
the combination will be realized, it is obvious that both can neutralize
each other very well. After all, we can assume that also the page's
PageRank decreases, the lower the hierarchy level is, so that a
PR0 can easily be achieved for all pages.
If we now assume that the hierarchically inferior page G links
to a page X with a constant BadRank BR(X)=10, whereby the link from
page G is the only inbound link for page X, and if all values E
for our example website equal 1, we get, at a damping factor d of
0.85, the following values:
 |
Page
A
B
C
D
E
F
G |
BadRank
4.82
7.50
14.50
4.22
4.22
11.22
17.18 |
|
 |
 |
 |
In this case, we see that the distribution of the BadRank is less
homogeneous than in the first scenario. Non the less, a distribution
of BadRank among all pages of the website takes place. Indeed, the
relatively low BadRank of the index page A is remarkable. It could
be a problem to neutralize its PageRank which should be higher compared
to the rest of the pages. This effect is not really desirable but
it reflects the experiences of numerous webmasters. Quite often,
we can see the phenomenom that all pages except for the index page
of a website show a PR0 in the Google Toolbar, whereby the index
page often has a Toolbar PageRank between 2 and 4. Therefore, we
can probably assume that this special variant of PR0 is not caused
by the detection of the according website by a spam filter, but
the site rather received a penalty for "linking to bad neighbourhoods".
Indeed, it is also possible that this variant of PR0 occurs when
only hierarchical inferior pages of a website get trapped in a spam
filter.
11.
PR0 - Google's PageRank 0 Penalty (continued)
This article reproduced with permission of eFactory.
© 2002 eFactory Internet-Agentur KG Online-Marketing - written
by Markus Sobek
PageRank and Google are trademarks of Google Inc., Mountain ViewCA,
USA.
PageRank is protected by US Patent 6,285,999.
|