Class AutomatonURLFilter
- java.lang.Object
-
- org.apache.nutch.urlfilter.api.RegexURLFilterBase
-
- org.apache.nutch.urlfilter.automaton.AutomatonURLFilter
-
- All Implemented Interfaces:
Configurable,URLFilter,Pluggable
public class AutomatonURLFilter extends RegexURLFilterBase
RegexURLFilterBase implementation based on the dk.brics.automaton Finite-State Automata for JavaTM.- Author:
- Jérôme Charron
- See Also:
- dk.brics.automaton
-
-
Field Summary
Fields Modifier and Type Field Description static StringURLFILTER_AUTOMATON_FILEstatic StringURLFILTER_AUTOMATON_RULES-
Fields inherited from class org.apache.nutch.urlfilter.api.RegexURLFilterBase
hasHostDomainRules
-
Fields inherited from interface org.apache.nutch.net.URLFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description AutomatonURLFilter()AutomatonURLFilter(String filename)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected RegexRulecreateRule(boolean sign, String regex)Creates a newRegexRule.protected RegexRulecreateRule(boolean sign, String regex, String hostOrDomain)Creates a newRegexRule.protected ReadergetRulesReader(Configuration conf)Rules specified as a config property will override rules specified as a config file.static voidmain(String[] args)-
Methods inherited from class org.apache.nutch.urlfilter.api.RegexURLFilterBase
filter, getConf, main, setConf
-
-
-
-
Field Detail
-
URLFILTER_AUTOMATON_FILE
public static final String URLFILTER_AUTOMATON_FILE
- See Also:
- Constant Field Values
-
URLFILTER_AUTOMATON_RULES
public static final String URLFILTER_AUTOMATON_RULES
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
AutomatonURLFilter
public AutomatonURLFilter()
-
AutomatonURLFilter
public AutomatonURLFilter(String filename) throws IOException, PatternSyntaxException
- Throws:
IOExceptionPatternSyntaxException
-
-
Method Detail
-
getRulesReader
protected Reader getRulesReader(Configuration conf) throws IOException
Rules specified as a config property will override rules specified as a config file.- Specified by:
getRulesReaderin classRegexURLFilterBase- Parameters:
conf- is the current configuration.- Returns:
- the name of the resource containing the rules to use.
- Throws:
IOException- if there is a fatal error obtaining theReader
-
createRule
protected RegexRule createRule(boolean sign, String regex)
Description copied from class:RegexURLFilterBaseCreates a newRegexRule.- Specified by:
createRulein classRegexURLFilterBase- Parameters:
sign- of the regular expression. Atruevalue means that any URL matching this rule must be included, whereas afalsevalue means that any URL matching this rule must be excluded.regex- is the regular expression associated to this rule.- Returns:
RegexRule
-
createRule
protected RegexRule createRule(boolean sign, String regex, String hostOrDomain)
Description copied from class:RegexURLFilterBaseCreates a newRegexRule.- Specified by:
createRulein classRegexURLFilterBase- Parameters:
sign- of the regular expression. Atruevalue means that any URL matching this rule must be included, whereas afalsevalue means that any URL matching this rule must be excluded.regex- is the regular expression associated to this rule.hostOrDomain- the host or domain to which this regex belongs- Returns:
RegexRule
-
main
public static void main(String[] args) throws IOException
- Throws:
IOException
-
-