Uses of Class
org.apache.nutch.parse.HTMLMetaTags
-
Packages that use HTMLMetaTags Package Description org.apache.nutch.analysis.lang Text document language identifier.org.apache.nutch.microformats.reltag A microformats Rel-Tag Parser/Indexer/Querier plugin.org.apache.nutch.parse TheParseinterface and related classes.org.apache.nutch.parse.headings Parse filter to extract headings (h1, h2, etc.) from DOM parse tree.org.apache.nutch.parse.html An HTML document parsing plugin.org.apache.nutch.parse.js Parser and parse filter plugin to extract all (possible) links from JavaScript files and embedded JavaScript code snippets.org.apache.nutch.parse.metatags Parse filter to extract meta tags: keywords, description, etc.org.apache.nutch.parse.tika Parse various document formats with help of Apache Tika.org.apache.nutch.parsefilter.debug Adds serialized DOM to parse data, useful for debugging, to understand how the parser implementation interprets a document (not only HTML).org.apache.nutch.parsefilter.naivebayes Html Parse filter that classifies the outlinks from the parseresult as relevant or irrelevant based on the parseText's relevancy (using a training file where you can give positive and negative example texts see the description of parsefilter.naivebayes.trainfile) and if found irrelevent it gives the link a second chance if it contains any of the words from the list given in parsefilter.naivebayes.wordlist.org.apache.nutch.parsefilter.regex RegexParseFilter.org.creativecommons.nutch Sample plugins that parse and index Creative Commons metadata. -
-
Uses of HTMLMetaTags in org.apache.nutch.analysis.lang
Methods in org.apache.nutch.analysis.lang with parameters of type HTMLMetaTags Modifier and Type Method Description ParseResultHTMLLanguageParser. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)Scan the HTML document looking at possible indications of content language
1. -
Uses of HTMLMetaTags in org.apache.nutch.microformats.reltag
Methods in org.apache.nutch.microformats.reltag with parameters of type HTMLMetaTags Modifier and Type Method Description ParseResultRelTagParser. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)Scan the HTML document looking at possible rel-tags -
Uses of HTMLMetaTags in org.apache.nutch.parse
Methods in org.apache.nutch.parse with parameters of type HTMLMetaTags Modifier and Type Method Description ParseResultHtmlParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.ParseResultHtmlParseFilters. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)Run all defined filters. -
Uses of HTMLMetaTags in org.apache.nutch.parse.headings
Methods in org.apache.nutch.parse.headings with parameters of type HTMLMetaTags Modifier and Type Method Description ParseResultHeadingsParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc) -
Uses of HTMLMetaTags in org.apache.nutch.parse.html
Methods in org.apache.nutch.parse.html with parameters of type HTMLMetaTags Modifier and Type Method Description static voidHTMLMetaProcessor. getMetaTags(HTMLMetaTags metaTags, Node node, URL currURL)Sets the indicators inrobotsMetato appropriate values, based on any META tags found under the givennode. -
Uses of HTMLMetaTags in org.apache.nutch.parse.js
Methods in org.apache.nutch.parse.js with parameters of type HTMLMetaTags Modifier and Type Method Description ParseResultJSParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)Scan the JavaScript fragments of a HTML page looking for possibleOutlink's -
Uses of HTMLMetaTags in org.apache.nutch.parse.metatags
Methods in org.apache.nutch.parse.metatags with parameters of type HTMLMetaTags Modifier and Type Method Description ParseResultMetaTagsParser. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc) -
Uses of HTMLMetaTags in org.apache.nutch.parse.tika
Methods in org.apache.nutch.parse.tika with parameters of type HTMLMetaTags Modifier and Type Method Description static voidHTMLMetaProcessor. getMetaTags(HTMLMetaTags metaTags, Node node, URL currURL)Sets the indicators inrobotsMetato appropriate values, based on any META tags found under the givennode. -
Uses of HTMLMetaTags in org.apache.nutch.parsefilter.debug
Methods in org.apache.nutch.parsefilter.debug with parameters of type HTMLMetaTags Modifier and Type Method Description ParseResultDebugParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc) -
Uses of HTMLMetaTags in org.apache.nutch.parsefilter.naivebayes
Methods in org.apache.nutch.parsefilter.naivebayes with parameters of type HTMLMetaTags Modifier and Type Method Description ParseResultNaiveBayesParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc) -
Uses of HTMLMetaTags in org.apache.nutch.parsefilter.regex
Methods in org.apache.nutch.parsefilter.regex with parameters of type HTMLMetaTags Modifier and Type Method Description ParseResultRegexParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc) -
Uses of HTMLMetaTags in org.creativecommons.nutch
Methods in org.creativecommons.nutch with parameters of type HTMLMetaTags Modifier and Type Method Description ParseResultCCParseFilter. filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.
-