This project is a sample how to work with Hadoop. It contains 3 jobs to parse, calculate and order the pageranking of a Wikipedia dump. This source is used for the blog at xebia: https://xebia.com/blog/wiki-pagerank-with-hadoop/
Requires:
- Maven
- Hadoop cluster with HDFS.
- Wiki dump input file: http://dumps.wikimedia.org/nlwiki/latest/nlwiki-latest-pages-articles.xml.bz2
- Eclipse with Hadoop plugin