TwitterFacebookPinterestGoogle+

Tag Archives: Apache

Two ways to install Nutch

Option 1: Setup Nutch from a binary distribution Download a binary package (apache-nutch-1.X-bin.zip) from here. Unzip your binary Nutch package. There should be a folder apache-nutch-1.X. cd apache-nutch-1.X/ From now on, we are going to use ${NUTCH_RUNTIME_HOME} to refer to the current directory (apache-nutch-1.X/). Option 2: Set up Nutch from a source distribution Advanced users may…

Read more

What is Apache Nutch

Apache Nutch is a highly extensible and scalable open source web crawler software project. Stemming from Apache Lucene, the project has diversified and now comprises two codebases, namely: Nutch 1.x: A well matured, production ready crawler. 1.x enables fine grained configuration, relying on Apache Hadoop data structures, which are great for batch processing. Nutch 2.x:…

Read more

Sections

Shows

Local News

Tools

About Us

Follow Us

Skip to toolbar