This site is archived.

Reloaded - How to build a Jobs Aggregation Engine with Nutch, Solr and Views...

Nutch Web Crawler
Code & Development

Reloaded - How to build a Jobs Aggregation Engine with Nutch, Solr and Views...

45 minutes (+15 minutes Q&A)

Room:

tags

views | Apache Solr | Nutch

Because I didn't get to goto Drupalcon SF to presesnt in person I though the European Drupalist would care to hear a revisted version of the speech

Nutch is an open web crawler that lets you do fine grained or Internet wide web crawling. In this session I will introduce you to the Drupal Nutch module, which will help with the setup and control of your crawls. We will combine this with some of the new features in the Apache Solr, Views 3 and Apache Solr views to create hybrid search engine vertical that interleaves your content with supporting web content.

The Agenda will be:

  1. An introduction to the Apache Nutch crawler
  2. An introduction to the Features of the Drupal Nutch module
  3. Technical Design decisions on combining crawled data with your Drupal data in Apache Solr
  4. Bringing it all together with a demo of a jobs aggregation search engine
  5. Questions

Resources