Package org.apache.nutch.urlfilter.regex
Class RegexURLFilter
- java.lang.Object
-
- org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
- org.apache.nutch.urlfilter.regex.RegexURLFilter
-
- All Implemented Interfaces:
Configurable,URLFilter,Pluggable
- Direct Known Subclasses:
ExemptionUrlFilter
public class RegexURLFilter extends RegexURLFilterBase
Filters URLs based on a file of regular expressions using theJava Regex implementation.
-
-
Field Summary
Fields Modifier and Type Field Description static StringURLFILTER_REGEX_FILEstatic StringURLFILTER_REGEX_RULES-
Fields inherited from class org.apache.nutch.urlfilter.api.RegexURLFilterBase
hasHostDomainRules
-
Fields inherited from interface org.apache.nutch.net.URLFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description RegexURLFilter()RegexURLFilter(String filename)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected RegexRulecreateRule(boolean sign, String regex)Creates a newRegexRule.protected RegexRulecreateRule(boolean sign, String regex, String hostOrDomain)Creates a newRegexRule.protected ReadergetRulesReader(Configuration conf)Rules specified as a config property will override rules specified as a config file.static voidmain(String[] args)-
Methods inherited from class org.apache.nutch.urlfilter.api.RegexURLFilterBase
filter, getConf, main, setConf
-
-
-
-
Field Detail
-
URLFILTER_REGEX_FILE
public static final String URLFILTER_REGEX_FILE
- See Also:
- Constant Field Values
-
URLFILTER_REGEX_RULES
public static final String URLFILTER_REGEX_RULES
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
RegexURLFilter
public RegexURLFilter()
-
RegexURLFilter
public RegexURLFilter(String filename) throws IOException, PatternSyntaxException
- Throws:
IOExceptionPatternSyntaxException
-
-
Method Detail
-
getRulesReader
protected Reader getRulesReader(Configuration conf) throws IOException
Rules specified as a config property will override rules specified as a config file.- Specified by:
getRulesReaderin classRegexURLFilterBase- Parameters:
conf- is the current configuration.- Returns:
- the name of the resource containing the rules to use.
- Throws:
IOException- if there is a fatal error obtaining theReader
-
createRule
protected RegexRule createRule(boolean sign, String regex)
Description copied from class:RegexURLFilterBaseCreates a newRegexRule.- Specified by:
createRulein classRegexURLFilterBase- Parameters:
sign- of the regular expression. Atruevalue means that any URL matching this rule must be included, whereas afalsevalue means that any URL matching this rule must be excluded.regex- is the regular expression associated to this rule.- Returns:
RegexRule
-
createRule
protected RegexRule createRule(boolean sign, String regex, String hostOrDomain)
Description copied from class:RegexURLFilterBaseCreates a newRegexRule.- Specified by:
createRulein classRegexURLFilterBase- Parameters:
sign- of the regular expression. Atruevalue means that any URL matching this rule must be included, whereas afalsevalue means that any URL matching this rule must be excluded.regex- is the regular expression associated to this rule.hostOrDomain- the host or domain to which this regex belongs- Returns:
RegexRule
-
main
public static void main(String[] args) throws IOException
- Throws:
IOException
-
-