| for different terms,
but of a few PageRanks for different topics. Topic-Sensitive PageRank
is based on the link structure of the whole web, whereby the topic
sensitivity implies that there is a different weighting for each
topic.
The basic principle of Haveliwala's approach has already been described
in our section on the "Yahoo-Bonus", where we have discussed
the possibility to assign a particular imporance to certain web
pages. In the words of the Random Surfer Model, this is realized
by increasing the probability for the Random Surfer jumping to a
page after "getting bored". Via links, this manual intervention
in the PageRank technique has an influence on the PageRank of each
page on the web. More precisely, we have reached taking influence
on PageRank by implementing another value E in the PageRank algorithm:
PR(A) = E(A) (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
Haveliwala's Topic-Sensitive-PageRank goes one step further. Instead
of assigning a universally higher value to a website or a web page,
Haveliwala differentiates on the basis of different topics. For
each of these topics, he identifies other authority pages. On the
basis of this evaluation, different PageRanks are calculated - each
separately, but for the entire web.
For his experiments on Topic-Sensitive PageRank, Haveliwala has
chosen the 16 top-level categories of the Open Directory Project
both for the identification of topics and for the intervention in
PageRank. More precisely, Haveliwala assigns a higher value E to
the pages of those ODP categories for which he calculates PageRank.
If, for example, he calculates the PageRank for the topic health,
all the ODP pages in the health category receive a relatively higher
value E and they pass this value in the form of PageRank on to the
pages which are linked from there. Of course, this PageRank is passed
on to other pages and, if we assume that health-related websites
tend to link more often to other websites within that topic, pages
on the topic health generally receive a higher PageRank.
Haveliwala confirms the incompleteness of choosing the Open Directory
Project in order to identify topics, which for example results in
a high degree of dependence on ODP editors and in a rather rough
subdivision into topics. But, as Haveliwala states, his method shows
good results and it can surely be improved without big effort.
However, one crucial point in Haveliwala's work on Topic-Sensitive-PageRank
is the identification of the user's preferences. Having a Topic-Specific
PageRank is useless as long as we do not know in which topics an
actual user is interested. In the end, search results must be based
on the PageRank that matches the user's preferences best. The Topic-Sensitive
PageRank can only be used if these are known.
Indeed, Haveliwala does supply some practicable approaches for
the identification of user preferences. He describes, for example,
the search in context by highlighting terms on a web page. In this
way, the content of that web page could be an indicator for waht
the user is looking for. At this point, we want to note the potential
of the Google Toolbar. The Toolbar submits data regarding search
terms and pages that a user has visited to Google. This data can
be used to create user profiles which can then be a basis for the
identification of the user's preferences. However, even without
using such techniques, it is imaginable that a user simply chooses
the topic he is interested in before he does a query.
10.
Theme-Based PageRank (continued)
This article reproduced with permission of eFactory.
© 2002 eFactory Internet-Agentur KG Online-Marketing - written
by Markus Sobek
PageRank and Google are trademarks of Google Inc., Mountain ViewCA,
USA.
PageRank is protected by US Patent 6,285,999.
|