Package org.apache.nutch.parse
Interface Parser
-
- All Superinterfaces:
Configurable,Pluggable
- All Known Implementing Classes:
ExtParser,FeedParser,HtmlParser,JSParseFilter,TikaParser,ZipParser
public interface Parser extends Pluggable, Configurable
A parser for content generated by aProtocolimplementation. This interface is implemented by extensions. Nutch's core contains no page parsing code.
-
-
Field Summary
Fields Modifier and Type Field Description static StringX_POINT_IDThe name of the extension point.
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description ParseResultgetParse(Content c)This method parses the given content and returns a map of <key, parse> pairs.-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
-
-
-
Field Detail
-
X_POINT_ID
static final String X_POINT_ID
The name of the extension point.
-
-
Method Detail
-
getParse
ParseResult getParse(Content c)
This method parses the given content and returns a map of <key, parse> pairs.
Parseinstances will be persisted under the given key.Note: Meta-redirects should be followed only when they are coming from the original URL. That is:
Assume fetcher is in parsing mode and is currently processing foo.bar.com/redirect.html. If this url contains a meta redirect to another url, fetcher should only follow the redirect if the map contains an entry of the form <"foo.bar.com/redirect.html",Parsewith aParseStatusindicating the redirect>.- Parameters:
c- Content to be parsed- Returns:
- a map containing <key, parse> pairs
- Since:
- NUTCH-443
-
-