Class ArbitraryIndexingFilter
- java.lang.Object
-
- org.apache.nutch.indexer.arbitrary.ArbitraryIndexingFilter
-
- All Implemented Interfaces:
Configurable,IndexingFilter,Pluggable
public class ArbitraryIndexingFilter extends Object implements IndexingFilter
Adds arbitrary searchable fields to a document from the class and method the user identifies in the config. The user supplies the name of the field to add with the class and method names that supply the value. Example:
<property>
<name>index.arbitrary.function.count</name>
<value>1</value>
</property>
<property>
<name>index.arbitrary.fieldName.0</name>
<value>advisors</value>
</property>
<property>
<name>index.arbitrary.className.0</name>
<value>com.example.arbitrary.AdvisorCalculator</value>
</property>
<property>
<name>index.arbitrary.constructorArgs.0</name>
<value>Kirk</value>
</property>
<property>
<name>index.arbitrary.methodName.0</name>
<value>countAdvisors</value>
</property>
<property>
<name>index.arbitrary.methodArgs.0</name>
<value>Spock,McCoy</value>
</property>
To set more than one arbitrary field value, incrementindex.arbitrary.function.countand repeat the rest of these blocks with successive int values appended to the property names, e.g. fieldName.1, methodName.1, etc.
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description ArbitraryIndexingFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description NutchDocumentfilter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)TheArbitraryIndexingFilterfilter object uses reflection to instantiate the configured class and invoke the configured method.ConfigurationgetConf()Get theConfigurationobjectvoidsetConf(Configuration conf)Set theConfigurationobjectvoidsetIndexedConf(Configuration conf, int ndx)Set theConfigurationobject for a specific set of values in the config
-
-
-
Method Detail
-
filter
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
TheArbitraryIndexingFilterfilter object uses reflection to instantiate the configured class and invoke the configured method. It requires a few configuration settings for adding arbitrary fields and values to the NutchDocument as searchable fields. Seeindex.arbitrary.function.count, and (possibly multiple instances whenindex.arbitrary.function.count> 1) of the followingindex.arbitrary.fieldName.index,index.arbitrary.className.index,index.arbitrary.constructorArgs.index,index.arbitrary.methodName.index, andindex.arbitrary.methodArgs.index in nutch-default.xml or nutch-site.xml where index ranges from 0 toindex.arbitrary.function.count- 1.- Specified by:
filterin interfaceIndexingFilter- Parameters:
doc- TheNutchDocumentobjectparse- The relevantParseobject passing through the filterurl- URL to be filtered by the user-specified classdatum- TheCrawlDatumentryinlinks- TheInlinkscontaining anchor text- Returns:
- filtered NutchDocument
- Throws:
IndexingException- if an error occurs during during filtering
-
setConf
public void setConf(Configuration conf)
Set theConfigurationobject- Specified by:
setConfin interfaceConfigurable
-
setIndexedConf
public void setIndexedConf(Configuration conf, int ndx)
Set theConfigurationobject for a specific set of values in the config- Parameters:
conf- The Configuration object holding values for the current arbitrary field.ndx- The ordinal counter value for the current arbitrary field appended to the base property names in the xml configuration file.
-
getConf
public Configuration getConf()
Get theConfigurationobject- Specified by:
getConfin interfaceConfigurable
-
-