Package org.apache.nutch.parse
Class ParseData
- java.lang.Object
-
- org.apache.hadoop.io.VersionedWritable
-
- org.apache.nutch.parse.ParseData
-
- All Implemented Interfaces:
Writable
public final class ParseData extends VersionedWritable
Data extracted from a page's content.- See Also:
Parse.getData()
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanequals(Object o)MetadatagetContentMeta()The originalMetadataretrieved from contentStringgetMeta(String name)Get a metadata single value.Outlink[]getOutlinks()Get the outlinks of the page.MetadatagetParseMeta()Other content properties.ParseStatusgetStatus()Get the status of parsing the page.StringgetTitle()Get the title of the page.bytegetVersion()static voidmain(String[] argv)static ParseDataread(DataInput in)voidreadFields(DataInput in)voidsetOutlinks(Outlink[] outlinks)voidsetParseMeta(Metadata parseMeta)StringtoString()voidwrite(DataOutput out)
-
-
-
Field Detail
-
DIR_NAME
public static final String DIR_NAME
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
ParseData
public ParseData()
-
ParseData
public ParseData(ParseStatus status, String title, Outlink[] outlinks, Metadata contentMeta)
-
ParseData
public ParseData(ParseStatus status, String title, Outlink[] outlinks, Metadata contentMeta, Metadata parseMeta)
-
-
Method Detail
-
getStatus
public ParseStatus getStatus()
Get the status of parsing the page.- Returns:
- the
ParseStatus
-
getOutlinks
public Outlink[] getOutlinks()
Get the outlinks of the page.- Returns:
- an array of
Outlink's
-
getContentMeta
public Metadata getContentMeta()
The originalMetadataretrieved from content- Returns:
- the original content
Metadata
-
getParseMeta
public Metadata getParseMeta()
Other content properties. This is the place to find format-specific properties. Different parser implementations for different content types will populate this differently.- Returns:
- a
Metadata
-
setParseMeta
public void setParseMeta(Metadata parseMeta)
-
setOutlinks
public void setOutlinks(Outlink[] outlinks)
-
getMeta
public String getMeta(String name)
Get a metadata single value. This method first looks for the metadata value in the parse metadata. If no value is found it the looks for the metadata in the content metadata.- Parameters:
name- the metadata key for which to retrieve a value- Returns:
- the (string) metadata value
- See Also:
getContentMeta(),getParseMeta()
-
getVersion
public byte getVersion()
- Specified by:
getVersionin classVersionedWritable
-
readFields
public final void readFields(DataInput in) throws IOException
- Specified by:
readFieldsin interfaceWritable- Overrides:
readFieldsin classVersionedWritable- Throws:
IOException
-
write
public final void write(DataOutput out) throws IOException
- Specified by:
writein interfaceWritable- Overrides:
writein classVersionedWritable- Throws:
IOException
-
read
public static ParseData read(DataInput in) throws IOException
- Throws:
IOException
-
-