Package org.apache.nutch.fetcher
Class FetchItemQueues
- java.lang.Object
-
- org.apache.nutch.fetcher.FetchItemQueues
-
public class FetchItemQueues extends Object
A collection of queues that keeps track of the total number of items, and provides items eligible for fetching from any queue.
-
-
Field Summary
Fields Modifier and Type Field Description static StringDEFAULT_IDstatic StringQUEUE_MODE_DOMAINstatic StringQUEUE_MODE_HOSTstatic StringQUEUE_MODE_IP
-
Constructor Summary
Constructors Constructor Description FetchItemQueues(Configuration conf)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.nutch.fetcher.FetchItemQueues.QueuingStatusaddFetchItem(Text url, CrawlDatum datum)org.apache.nutch.fetcher.FetchItemQueues.QueuingStatusaddFetchItem(FetchItem it)intcheckExceptionThreshold(String queueid)Increment the exception counter of a queue in case of an exception e.g.intcheckExceptionThreshold(String queueid, int maxExceptions, long delay)Increment the exception counter of a queue in case of an exception e.g.protected static StringcheckQueueMode(String queueMode)Check whether queue mode is valid, fall-back to default mode if not.intcheckTimelimit()voiddump()intemptyQueues()voidfinishFetchItem(FetchItem it)voidfinishFetchItem(FetchItem it, boolean asap)FetchItemgetFetchItem()FetchItemQueuegetFetchItemQueue(String id)intgetQueueCount()intgetQueueCountMaxExceptions()intgetTotalSize()booleanredirectIsQueuedRecently(Text redirUrl)booleantimelimitExceeded()
-
-
-
Field Detail
-
DEFAULT_ID
public static final String DEFAULT_ID
- See Also:
- Constant Field Values
-
QUEUE_MODE_HOST
public static final String QUEUE_MODE_HOST
- See Also:
- Constant Field Values
-
QUEUE_MODE_DOMAIN
public static final String QUEUE_MODE_DOMAIN
- See Also:
- Constant Field Values
-
QUEUE_MODE_IP
public static final String QUEUE_MODE_IP
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
FetchItemQueues
public FetchItemQueues(Configuration conf)
-
-
Method Detail
-
checkQueueMode
protected static String checkQueueMode(String queueMode)
Check whether queue mode is valid, fall-back to default mode if not.- Parameters:
queueMode- queue mode to check- Returns:
- valid queue mode or default
-
getTotalSize
public int getTotalSize()
-
getQueueCount
public int getQueueCount()
-
getQueueCountMaxExceptions
public int getQueueCountMaxExceptions()
-
addFetchItem
public org.apache.nutch.fetcher.FetchItemQueues.QueuingStatus addFetchItem(Text url, CrawlDatum datum)
-
addFetchItem
public org.apache.nutch.fetcher.FetchItemQueues.QueuingStatus addFetchItem(FetchItem it)
-
finishFetchItem
public void finishFetchItem(FetchItem it)
-
finishFetchItem
public void finishFetchItem(FetchItem it, boolean asap)
-
getFetchItemQueue
public FetchItemQueue getFetchItemQueue(String id)
-
getFetchItem
public FetchItem getFetchItem()
-
timelimitExceeded
public boolean timelimitExceeded()
- Returns:
- true if the fetcher timelimit is defined and has been exceeded
(
fetcher.timelimit.minsminutes after fetching started)
-
checkTimelimit
public int checkTimelimit()
-
emptyQueues
public int emptyQueues()
-
checkExceptionThreshold
public int checkExceptionThreshold(String queueid, int maxExceptions, long delay)
Increment the exception counter of a queue in case of an exception e.g. timeout; when higher than a given threshold simply empty the queue. The next fetch is delayed if specified by the paramdelayor configured by the propertyfetcher.exceptions.per.queue.delay.- Parameters:
queueid- a queue identifier to locate and checkmaxExceptions- custom-defined number of max. exceptions - if negative the value of the propertyfetcher.max.exceptions.per.queueis used.delay- a custom-defined time span in milliseconds to delay the next fetch in addition to the delay defined for the given queue. If a negative value is passed the delay is chosen byfetcher.exceptions.per.queue.delay- Returns:
- number of purged items
-
checkExceptionThreshold
public int checkExceptionThreshold(String queueid)
Increment the exception counter of a queue in case of an exception e.g. timeout; when higher than a given threshold simply empty the queue.- Parameters:
queueid- queue identifier to locate and check- Returns:
- number of purged items
- See Also:
checkExceptionThreshold(String, int, long)
-
redirectIsQueuedRecently
public boolean redirectIsQueuedRecently(Text redirUrl)
- Parameters:
redirUrl- redirect target- Returns:
- true if redirects are deduplicated and redirUrl has been queued recently
-
dump
public void dump()
-
-