| But it is doubtful
that it is actually implemented.
We do not want to claim that we have shown the only way of weighting
links on the basis of text analyses. Indeed, there are certainly
dozens of others. However, the approach that we provided here is
based on publications of important members of Google's staff and,
thus, we want to rest a critical evaluation on it.
Like always, when talking about PageRank, there is the question
if our approach is sufficienly scalable. On the one hand, it causes
additional memory requirements. After all, Stata, Bharat and Maghoul
describe the system architecture of a term vector database which
is different from Google's inverse index, since it maps from page
ids to terms and, so, can hardly be integrated in the existing architecture.
At the actual size of Google's index, the additional memory requirements
should be several hundred GB to a few TB. However, this should not
be so much of a problem since Google's index is most certainly several
times bigger. In fact, the time requirements for building the database
and for computing the weigtings appear to be the critical part.
Building a term verctor database should be approximately as time-consuming
as building an inverse index. Of course, many procecces can probably
be used for building both but if, for instance, the weighting of
terms in the term vectors has to differ from the weighting of terms
in the inverse index, the time requirements remain substantial.
If we assume that, like in our approach, content analyses are based
on computing the inner products of topic affinity vectors which
have to be calculated by matching term vectors and topic vectors,
this process should be approximately as time-consuming as computing
PageRank. Moreover, we have to consider that the PageRank calculations
themselves beome more complicated by weighting links.
So, the additional time requirements are definitely not negligible.
This is why we have to ask ourselves if weighting links based on
text analyses is useful at all. Links between thematically unrelated
page, which have been set for the sole purpose of boosting PageRank
of one page, may be annoying, but most certainly they are only a
small fraction of all links. Additionally, the web itself is completely
inhomogeneous. Google, Yahoo or the ODP do not owe their high PageRank
solely to links from other search engines or directories. A huge
part of the links on the web are simply not set for the purpose
of showing visitors ways to more thematically related information.
Indeed, the motivation for placing links is manifold. Moreover,
the problably most popular websites are completely inhomogeneous
in terms of theme. Think about portals like Yahoo or news websites
which contain articles that cover almost any subject of life. A
strong weighting of links as it has been described here could influence
those website's PageRanks significantly.
If the PageRank technique shall not become totally futile, a weighting
of links can only take place to a small extent. This, of course,
raises the question if the efforts it requires are justifiable.
After all, there are certainly other ways to eliminate spam which
often comes to the top of search results through thematically unrelated
and probably bought links.
Next
Article Segment
11.
PR0 - Google's PageRank 0 Penalty
This article reproduced with permission of eFactory.
© 2002 eFactory Internet-Agentur KG Online-Marketing - written
by Markus Sobek
PageRank and Google are trademarks of Google Inc., Mountain ViewCA,
USA.
PageRank is protected by US Patent 6,285,999.
|