Package org.apache.nutch.indexer
Index content, configure and run indexing and cleaning jobs to
add, update, and delete documents from an index. Two tasks are
delegated to plugins:
- indexing filters, which fill index fields of each document
- index writer plugins; which send documents to index back-ends (Solr, etc.).
-
Interface Summary Interface Description IndexingFilter Extension point for indexing.IndexWriter -
Class Summary Class Description CleaningJob The class scans CrawlDB looking for entries with status DB_GONE (404) or DB_DUPLICATE and sends delete requests to indexers for those documents.CleaningJob.DBFilter CleaningJob.DeleterReducer IndexerMapReduce This class is typically invoked from withinIndexingJoband handles all MapReduce functionality required when undertaking indexing.IndexerMapReduce.IndexerMapper IndexerMapReduce.IndexerReducer IndexerOutputFormat IndexingFilters Creates and cachesIndexingFilterimplementing plugins.IndexingFiltersChecker Reads and parses a URL and run the indexers on it.IndexingJob Generic indexer which relies on the plugins implementing IndexWriterIndexWriterConfig IndexWriterParams IndexWriters Creates and cachesIndexWriterimplementing plugins.NutchDocument ANutchDocumentis the unit of indexing.NutchField This class represents a multi-valued field with a weight.NutchIndexAction ANutchIndexActionis the new unit of indexing holding the document and action information. -
Exception Summary Exception Description IndexingException