Page Rank Algorithm

Google rates pages based on the way that others perceive and describe them, but has become our main source of information, so therefore continues to rate pages based on their adherence to their own delicately prescribed popularity contest.
The main criteria is relevance. Relevance as measured by people linking, siting it. It is more important if it is sited. So how do you measure the relevance objectively?
Because of the way that the Page Rank algorithm works, the more time you spend on a given page, the more relevant it is. But it is not the measure itself, you would spend more time there statistically because more links are likely to bring you there, and if spending an equal amount of time on a page while crawling, this would mean that the more time you spend on a given page, the more links from other pages are getting you there. The fact that the time correlates this way as a result of the page rank system, and that it make sense in an abstracted analogy of somebody spending more time on more ‘relevant’ pages is interesting, but probably not conclusive. At the core the relevance is still determined by random jumping and by determined relevance of other pages. I think the averaging and the composite of a lot of data would lead to the “intelligence of the crowd” idea, which is also interesting but has its flaws.
The creation of the Page Rank idea was in response to the fast growth of the internet, where the content is not in any way sorted by importance of type on its own, but everything from scientific journals to photo blog posts from third graders is ‘published’ equally. I think in many ways this lead to a kind of breaking down of traditional boundaries between disciplines, work, actions, and everything else. Google attempts to maintain order to some extent, to allow us to navigate through this world while maintaining some of the traditional associations we have in the world, but it attempt to do it systematically, algorithmically, and in a non-biased way. Of course there is no systematic approach to traditional associations, and although they seem to have found some useful and working patterns, these also serve to expose some of the inconsistency and subjectivity in our traditional social, academic, etc associations. For example, when google thinks that something is related that completely surprises you, but also makes sense at the same time, it is an untraditional association that would have been obvious were we algorithmically programmed. On the other hand, they could just not have found the perfect algorithm to work for everybody in the world yet.
” One of the main causes of this problem is that the number of documents in the indices has been increasing by many orders of magnitude, but the user’s ability to look at documents has not. People are still only willing to look at the first few tens of results.” The scalability is key. This includes efficient performance, fast performance and low storage necessity, so condensed hash maps and condensed sparse matrices.
Links seem to have different values. Links from relevant pages are more valuable. From the matrix I also inferred, though likely wrongly, that links from pages with less outgoing links are more valuable, as the value 1/N would be greater.
They really stressed the academic/research aspirations of google, contrasting them to the more commonly commercially focused engines. They stress the goal to build a solid architecture and data sets that could provide the basis for novel experiments and experiences.

Google rates pages based on the way that others perceive and describe them, but has become our main source of information, so therefore continues to rate pages based on their adherence to their own delicately prescribed popularity contest.
Relevance as measured by people linking, siting it. It makes sense especially with scientific papers, where work builds on top of other work that it needs to site and reference, so the number of times something is sited is an obvious measure of importance in some ways. But the main challenge is to measure the relevance objectively.
Because of the way that the Page Rank algorithm works, the more time you spend on a given page, the more relevant it is. But it is not the measure itself, you would spend more time there statistically because more links are likely to bring you there, and if spending an equal amount of time on a page while crawling, this would mean that the more time you spend on a given page, the more links from other pages are getting you there. The fact that the time correlates this way as a result of the page rank system, and that it make sense in an abstracted analogy of somebody spending more time on more ‘relevant’ pages is interesting, but probably not conclusive. At the core the relevance is still determined by random jumping and by determined relevance of other pages. I think the averaging and the composite of a lot of data would lead to the “intelligence of the crowd” idea, which is also interesting but has its flaws.
The creation of the Page Rank idea was in response to the fast growth of the internet, where the content is not in any way sorted by importance of type on its own, but everything from scientific journals to photo blog posts from third graders is ‘published’ equally. I think in many ways this lead to a kind of breaking down of traditional boundaries between disciplines, work, actions, and everything else. Google attempts to maintain order to some extent, to allow us to navigate through this world while maintaining some of the traditional associations we have in the world, but it attempt to do it systematically, algorithmically, and in a non-biased way. Of course there is no systematic approach to traditional associations, and although they seem to have found some useful and working patterns, these also serve to expose some of the inconsistency and subjectivity in our traditional social, academic, etc associations. For example, when google thinks that something is related that completely surprises you, but also makes sense at the same time, it is an untraditional association that would have been obvious were we algorithmically programmed. On the other hand, they could just not have found the perfect algorithm to work for everybody in the world yet.

” One of the main causes of this problem is that the number of documents in the indices has been increasing by many orders of magnitude, but the user’s ability to look at documents has not. People are still only willing to look at the first few tens of results.” The scalability is key. This includes efficient performance, fast performance and low storage necessity, so condensed hash maps and condensed sparse matrices.

In the google paper, really stressed the academic/research aspirations of google, contrasting them to the more commonly commercially focused engines. They stress the goal to build a solid architecture and data sets that could provide the basis for novel experiments and experiences.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>