Package org.apache.nutch.tools
Interface CommonCrawlFormat
-
- All Superinterfaces:
AutoCloseable,Closeable
- All Known Implementing Classes:
AbstractCommonCrawlFormat,CommonCrawlFormatJackson,CommonCrawlFormatJettinson,CommonCrawlFormatSimple,CommonCrawlFormatWARC
public interface CommonCrawlFormat extends Closeable
Interface for all CommonCrawl formatter. It provides the signature for the method used to get JSON data.- Author:
- gtotaro
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description voidclose()Optional method that could be implemented if the actual format needs some close procedure.List<String>getInLinks()gets set of inlinksStringgetJsonData()Get a string representation of the JSON structure of the URL content.StringgetJsonData(String url, Content content, Metadata metadata)Returns a string representation of the JSON structure of the URL content.StringgetJsonData(String url, Content content, Metadata metadata, ParseData parseData)Returns a string representation of the JSON structure of the URL content.voidsetInLinks(List<String> inLinks)sets inlinks of this document
-
-
-
Method Detail
-
getJsonData
String getJsonData() throws IOException
Get a string representation of the JSON structure of the URL content.- Returns:
- the JSON URL content string
- Throws:
IOException- if there is a fatal I/O error obtaining JSON data
-
getJsonData
String getJsonData(String url, Content content, Metadata metadata) throws IOException
Returns a string representation of the JSON structure of the URL content. Takes into consideration both theContentandMetadata- Parameters:
url- the canonical urlcontent- urlContentmetadata- urlMetadata- Returns:
- the JSON URL content string
- Throws:
IOException- if there is a fatal I/O error obtaining JSON data
-
getJsonData
String getJsonData(String url, Content content, Metadata metadata, ParseData parseData) throws IOException
Returns a string representation of the JSON structure of the URL content. Takes into consideration theContent,MetadataandParseData.- Parameters:
url- the canonical urlcontent- urlContentmetadata- urlMetadataparseData- urlParseData- Returns:
- the JSON URL content string
- Throws:
IOException- if there is a fatal I/O error obtaining JSON data
-
setInLinks
void setInLinks(List<String> inLinks)
sets inlinks of this document- Parameters:
inLinks- list of inlinks
-
close
void close()
Optional method that could be implemented if the actual format needs some close procedure.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable
-
-