Package org.apache.nutch.indexer.feed
Class FeedIndexingFilter
- java.lang.Object
-
- org.apache.nutch.indexer.feed.FeedIndexingFilter
-
- All Implemented Interfaces:
Configurable,IndexingFilter,Pluggable
public class FeedIndexingFilter extends Object implements IndexingFilter
- Since:
- NUTCH-444
An
IndexingFilterimplementation to pull out the relevant extractedMetadatafields from the RSS feeds and into the index. - Author:
- dogacan, mattmann
-
-
Field Summary
Fields Modifier and Type Field Description static StringdateFormatStr-
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description FeedIndexingFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description NutchDocumentfilter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to theIndexerfor indexing within the Nutch index.ConfigurationgetConf()voidsetConf(Configuration conf)Sets theConfigurationobject used to configure thisIndexingFilter.
-
-
-
Field Detail
-
dateFormatStr
public static final String dateFormatStr
- See Also:
- Constant Field Values
-
-
Method Detail
-
filter
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
Extracts out the relevant fields:- FEED_AUTHOR
- FEED_TAGS
- FEED_PUBLISHED
- FEED_UPDATED
- FEED
Indexerfor indexing within the Nutch index.- Specified by:
filterin interfaceIndexingFilter- Parameters:
doc- document instance for collecting fieldsparse- parse data instanceurl- page urldatum- crawl datum for the page (fetch datum from segment containing fetch status and fetch time)inlinks- page inlinks- Returns:
- modified (or a new) document instance, or null (meaning the document should be discarded)
- Throws:
IndexingException- if an error occurs during during filtering
-
getConf
public Configuration getConf()
- Specified by:
getConfin interfaceConfigurable- Returns:
- the
Configurationobject used to configure thisIndexingFilter.
-
setConf
public void setConf(Configuration conf)
Sets theConfigurationobject used to configure thisIndexingFilter.- Specified by:
setConfin interfaceConfigurable- Parameters:
conf- TheConfigurationobject used to configure thisIndexingFilter.
-
-