«
»
TwitterFacebookPinterestGoogle+

Tag Archives: Nutch

Nutch 2.X Tutorial

Introduction This document describes how to get Nutch 2.X to use HBase as a storage backend for Gora. It is assumed that you have a working knowledge of configuring Nutch 1.X, as currently configuration in 2.X is more complex. It is important to take this in to consideration before progressing any further. We therefore strongly…

Read more

Two ways to install Nutch

Option 1: Setup Nutch from a binary distribution Download a binary package (apache-nutch-1.X-bin.zip) from here. Unzip your binary Nutch package. There should be a folder apache-nutch-1.X. cd apache-nutch-1.X/ From now on, we are going to use ${NUTCH_RUNTIME_HOME} to refer to the current directory (apache-nutch-1.X/). Option 2: Set up Nutch from a source distribution Advanced users may…

Read more

What is Apache Nutch

Apache Nutch is a highly extensible and scalable open source web crawler software project. Stemming from Apache Lucene, the project has diversified and now comprises two codebases, namely: Nutch 1.x: A well matured, production ready crawler. 1.x enables fine grained configuration, relying on Apache Hadoop data structures, which are great for batch processing. Nutch 2.x:…

Read more

Sections

Shows

Local News

Tools

About Us

Follow Us

Skip to toolbar