Class RSSConnector.Filter

  • Enclosing class:
    RSSConnector

    protected static class RSSConnector.Filter
    extends java.lang.Object
    Class that handles parsing and interpretation of the document specification. Note that I believe it to be faster to do this once, gathering all the data, than to scan the document specification multiple times. Therefore, this class contains the *entire* interpreted set of data from a document specification.
    • Field Detail

      • seeds

        protected final java.util.Set<java.lang.String> seeds
      • defaultRescanInterval

        protected java.lang.Integer defaultRescanInterval
      • minimumRescanInterval

        protected java.lang.Integer minimumRescanInterval
      • badFeedRescanInterval

        protected java.lang.Integer badFeedRescanInterval
      • dechromedContentMode

        protected int dechromedContentMode
      • chromedContentMode

        protected int chromedContentMode
      • feedTimeoutValue

        protected int feedTimeoutValue
      • acls

        protected final java.util.Set<java.lang.String> acls
      • excludePatterns

        protected final java.util.List<java.util.regex.Pattern> excludePatterns
        The arraylist of exclude patterns
    • Constructor Detail

      • Filter

        public Filter​(org.apache.manifoldcf.core.interfaces.Specification spec,
                      boolean warnOnBadSeed)
               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Constructor.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException
    • Method Detail

      • isSeed

        public boolean isSeed​(java.lang.String canonicalUrl)
        Check if document is a seed
      • getSeeds

        public java.util.Iterator<java.lang.String> getSeeds()
        Iterate over all canonicalized seeds
      • getAcls

        public java.lang.String[] getAcls()
        Get the acls
      • getFeedTimeoutValue

        public int getFeedTimeoutValue()
        Get the feed timeout value
      • getDechromedContentMode

        public int getDechromedContentMode()
        Get the dechromed content mode
      • getChromedContentMode

        public int getChromedContentMode()
        Get the chromed content mode
      • getDefaultRescanTime

        public java.lang.Long getDefaultRescanTime​(long currentTime)
        Get the next time (by default) a feed should be scanned
      • getMinimumRescanTime

        public java.lang.Long getMinimumRescanTime​(long currentTime)
        Get the minimum next time a feed should be scanned
      • getBadFeedRescanTime

        public java.lang.Long getBadFeedRescanTime​(long currentTime)
        Get the next time a "bad feed" should be rescanned
      • isLegalURL

        public boolean isLegalURL​(java.lang.String url)
        Check for legality of a url.
        Returns:
        true if the passed-in url is either a seed, or a legal url, according to this filter.
      • mapDocumentURL

        public java.lang.String mapDocumentURL​(java.lang.String url)
                                        throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
        Scan patterns and return the one that matches first.
        Returns:
        null if the url doesn't match or should not be ingested, or the new string if it does.
        Throws:
        org.apache.manifoldcf.core.interfaces.ManifoldCFException