Apache Lucene: Search engine library with many features including fast indexing, ranked searching, boolean, phrase, and span queries, date-range searching, and extension APIs. News, details of sub-projects, documentation, and downloads. [Open Source

Authoritative Sources in a Hyperlinked Environment: HITs is a link-structure analysis algorithm which ranks pages by "authorities" (pages which have many incoming links and provide the best source of information on a given topic) and "hubs" (pages which have many outgoing links and provide useful lists of possibly relevant pages). Ranking is performed at query time

The PageRank Citation Ranking: Bringing Order to the Web: First Stanford paper about PageRank. It is a static ranking, performed at indexing time, which interprets a link from page A to page B as a vote, by page A, for page B. Web is seen as a direct graph and votes recursively propagate from nodes to nodes. Ranking is performed at indexing time. Used by Google

Vivisimo Clustering Engine: Their document clustering and meta-search software automatically categorizes search results on-the-fly into hierarchical clusters

 

 

Published by World Readable