Class ThrottledFetcher


  • public class ThrottledFetcher
    extends java.lang.Object
    This class uses httpclient to fetch stuff from webservers. However, it additionally controls the fetch rate in two ways: first, controlling the overall bandwidth used per server, and second, limiting the number of simultaneous open connections per server. An instance of this class would very probably need to have a lifetime consistent with the long-term nature of these values, and be static.
    • Method Detail

      • getConnection

        public static IThrottledConnection getConnection​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
                                                         java.lang.String throttleGroupName,
                                                         java.lang.String protocol,
                                                         java.lang.String server,
                                                         int port,
                                                         PageCredentials authentication,
                                                         org.apache.manifoldcf.connectorcommon.interfaces.IKeystoreManager trustStore,
                                                         org.apache.manifoldcf.connectorcommon.interfaces.IThrottleSpec throttleDescription,
                                                         java.lang.String[] binNames,
                                                         int connectionLimit,
                                                         java.lang.String proxyHost,
                                                         int proxyPort,
                                                         java.lang.String proxyAuthDomain,
                                                         java.lang.String proxyAuthUsername,
                                                         java.lang.String proxyAuthPassword,
                                                         int socketTimeoutMilliseconds,
                                                         int connectionTimeoutMilliseconds,
                                                         org.apache.manifoldcf.crawler.interfaces.IAbortActivity activities)
                                                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                         org.apache.manifoldcf.agents.interfaces.ServiceInterruption
        Obtain a connection to specified protocol, server, and port. We use the protocol because the setup for some protocols is extensive (e.g. https) and hopefully would not need to be repeated if we distinguish connections based on that.
        Parameters:
        protocol - is the protocol, e.g. "http"
        server - is the server IP address, e.g. "10.32.65.1"
        port - is the port to connect to, e.g. 80. Pass -1 if the default port for the protocol is desired.
        authentication - is the page credentials object to use for the fetch. If null, no credentials are available.
        trustStore - is the current trust store in effect for the fetch.
        binNames - is the set of bins, in order, that should be used for throttling this connection. Note that the bin names for a given IP address and port MUST be the same for every connection! This must be enforced by whatever it is that builds the bins - it must do so given an IP and port.
        throttleDescription - is the description of all the throttling that should take place.
        connectionLimit - isthe maximum number of connections permitted.
        Returns:
        an IThrottledConnection object that can be used to fetch from the port.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
      • flushIdleConnections

        public static void flushIdleConnections​(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext)
                                         throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Flush connections that have timed out from inactivity.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException