Package org.apache.nutch.parse.feed
Class FeedParser
- java.lang.Object
-
- org.apache.nutch.parse.feed.FeedParser
-
-
Field Summary
Fields Modifier and Type Field Description static StringCHARSET_UTF8static StringTEXT_PLAIN_CONTENT_TYPE-
Fields inherited from interface org.apache.nutch.parse.Parser
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description FeedParser()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description ConfigurationgetConf()ParseResultgetParse(Content content)Parses the given feed and extracts out and parsers all linked items within the feed, using the underlying ROME feed parsing library.static voidmain(String[] args)Runs a command line version of thisParser.voidsetConf(Configuration conf)Sets theConfigurationobject for thisParser.
-
-
-
Field Detail
-
CHARSET_UTF8
public static final String CHARSET_UTF8
- See Also:
- Constant Field Values
-
TEXT_PLAIN_CONTENT_TYPE
public static final String TEXT_PLAIN_CONTENT_TYPE
- See Also:
- Constant Field Values
-
-
Method Detail
-
getParse
public ParseResult getParse(Content content)
Parses the given feed and extracts out and parsers all linked items within the feed, using the underlying ROME feed parsing library.
-
setConf
public void setConf(Configuration conf)
Sets theConfigurationobject for thisParser. ThisParserexpects the following configuration properties to be set:- URLNormalizers - properties in the configuration object to set up the default url normalizers.
- URLFilters - properties in the configuration object to set up the default url filters.
- Specified by:
setConfin interfaceConfigurable- Parameters:
conf- The HadoopConfigurationobject to use to configure thisParser.
-
getConf
public Configuration getConf()
- Specified by:
getConfin interfaceConfigurable- Returns:
- The
Configurationobject used to configure thisParser.
-
-