Package org.apache.nutch.fetcher
Class FetchItem
- java.lang.Object
-
- org.apache.nutch.fetcher.FetchItem
-
public class FetchItem extends Object
This class describes the item to be fetched.
-
-
Constructor Summary
Constructors Constructor Description FetchItem(Text url, URL u, CrawlDatum datum, String queueID)FetchItem(Text url, URL u, CrawlDatum datum, String queueID, int outlinkDepth)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static FetchItemcreate(Text url, CrawlDatum datum, String queueMode)Create an item.static FetchItemcreate(Text url, CrawlDatum datum, String queueMode, int outlinkDepth)Create an item.CrawlDatumgetDatum()StringgetQueueID()TextgetUrl()URLgetURL2()
-
-
-
Constructor Detail
-
FetchItem
public FetchItem(Text url, URL u, CrawlDatum datum, String queueID)
-
FetchItem
public FetchItem(Text url, URL u, CrawlDatum datum, String queueID, int outlinkDepth)
-
-
Method Detail
-
create
public static FetchItem create(Text url, CrawlDatum datum, String queueMode)
Create an item. Queue id will be created based onqueueModeargument, either as a protocol + hostname pair, protocol + IP address pair or protocol+domain pair. Sets outlink depth to 0.- Parameters:
url- URL of fetch itemdatum- webpage information associated with the URLqueueMode- either byHost, byDomain or byIP.- Returns:
- a
FetchItemwith outlinks depth of 0 - See Also:
FetchItemQueues.QUEUE_MODE_DOMAIN,FetchItemQueues.QUEUE_MODE_HOST,FetchItemQueues.QUEUE_MODE_IP
-
create
public static FetchItem create(Text url, CrawlDatum datum, String queueMode, int outlinkDepth)
Create an item. Queue id will be created based onqueueModeargument, either as a protocol + hostname pair, protocol + IP address pair or protocol+domain pair. Configurable outlink depth.- Parameters:
url- URL of fetch itemdatum- webpage information associated with the URLqueueMode- either byHost, byDomain or byIPoutlinkDepth- the desired depth of outlink for this given FetchItem- Returns:
- a
FetchItem - See Also:
FetchItemQueues.QUEUE_MODE_DOMAIN,FetchItemQueues.QUEUE_MODE_HOST,FetchItemQueues.QUEUE_MODE_IP
-
getDatum
public CrawlDatum getDatum()
-
getQueueID
public String getQueueID()
-
getUrl
public Text getUrl()
-
getURL2
public URL getURL2()
-
-