Uses of Class
org.apache.nutch.parse.ParseData
-
Packages that use ParseData Package Description org.apache.nutch.crawl Crawl control code and tools to run the crawler.org.apache.nutch.parse TheParseinterface and related classes.org.apache.nutch.scoring TheScoringFilterinterface.org.apache.nutch.scoring.depth Scoring filter to stop crawling at a configurable depth (number of "hops" from seed URLs).org.apache.nutch.scoring.metadata Metadata Scoring Pluginorg.apache.nutch.scoring.opic Scoring filter implementing a variant of the Online Page Importance Computation (OPIC) algorithm.org.apache.nutch.scoring.similarity org.apache.nutch.scoring.similarity.cosine Implements the cosine similarity metric for scoring relevant documentsorg.apache.nutch.scoring.urlmeta URL Meta Tag Scoring Pluginorg.apache.nutch.segment A segment stores all data from on generate/fetch/update cycle: fetch list, protocol status, raw content, parsed content, and extracted outgoing links.org.apache.nutch.tools Miscellaneous tools. -
-
Uses of ParseData in org.apache.nutch.crawl
Methods in org.apache.nutch.crawl with parameters of type ParseData Modifier and Type Method Description voidLinkDb.LinkDbMapper. map(Text key, ParseData parseData, Mapper.Context context) -
Uses of ParseData in org.apache.nutch.parse
Methods in org.apache.nutch.parse that return ParseData Modifier and Type Method Description ParseDataParse. getData()Other data extracted from the page.ParseDataParseImpl. getData()static ParseDataParseData. read(DataInput in)Methods in org.apache.nutch.parse with parameters of type ParseData Modifier and Type Method Description voidParseResult. put(String key, ParseText text, ParseData data)Store a result of parsing.voidParseResult. put(Text key, ParseText text, ParseData data)Store a result of parsing.Constructors in org.apache.nutch.parse with parameters of type ParseData Constructor Description ParseImpl(String text, ParseData data)ParseImpl(ParseText text, ParseData data)ParseImpl(ParseText text, ParseData data, boolean isCanonical) -
Uses of ParseData in org.apache.nutch.scoring
Methods in org.apache.nutch.scoring with parameters of type ParseData Modifier and Type Method Description CrawlDatumAbstractScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)CrawlDatumScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)Distribute score value from the current page to all its outlinked pages.CrawlDatumScoringFilters. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount) -
Uses of ParseData in org.apache.nutch.scoring.depth
Methods in org.apache.nutch.scoring.depth with parameters of type ParseData Modifier and Type Method Description CrawlDatumDepthScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount) -
Uses of ParseData in org.apache.nutch.scoring.metadata
Methods in org.apache.nutch.scoring.metadata with parameters of type ParseData Modifier and Type Method Description CrawlDatumMetadataScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)This will take the metadata that you have listed in your "scoring.parse.md" property, and looks for them inside the parseData object. -
Uses of ParseData in org.apache.nutch.scoring.opic
Methods in org.apache.nutch.scoring.opic with parameters of type ParseData Modifier and Type Method Description CrawlDatumOPICScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply. -
Uses of ParseData in org.apache.nutch.scoring.similarity
Methods in org.apache.nutch.scoring.similarity with parameters of type ParseData Modifier and Type Method Description CrawlDatumSimilarityModel. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)CrawlDatumSimilarityScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount) -
Uses of ParseData in org.apache.nutch.scoring.similarity.cosine
Methods in org.apache.nutch.scoring.similarity.cosine with parameters of type ParseData Modifier and Type Method Description CrawlDatumCosineSimilarity. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount) -
Uses of ParseData in org.apache.nutch.scoring.urlmeta
Methods in org.apache.nutch.scoring.urlmeta with parameters of type ParseData Modifier and Type Method Description CrawlDatumURLMetaScoringFilter. distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the parseData object. -
Uses of ParseData in org.apache.nutch.segment
Methods in org.apache.nutch.segment with parameters of type ParseData Modifier and Type Method Description booleanSegmentMergeFilter. filter(Text key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)The filtering method which gets all information being merged for a given key (URL).booleanSegmentMergeFilters. filter(Text key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)Iterates over allSegmentMergeFilterextensions and if any of them returns false, it will return false as well. -
Uses of ParseData in org.apache.nutch.tools
Methods in org.apache.nutch.tools with parameters of type ParseData Modifier and Type Method Description StringAbstractCommonCrawlFormat. getJsonData(String url, Content content, Metadata metadata, ParseData parseData)StringCommonCrawlFormat. getJsonData(String url, Content content, Metadata metadata, ParseData parseData)Returns a string representation of the JSON structure of the URL content.StringCommonCrawlFormatWARC. getJsonData(String url, Content content, Metadata metadata, ParseData parseData)Constructors in org.apache.nutch.tools with parameters of type ParseData Constructor Description CommonCrawlFormatWARC(String url, Content content, Metadata metadata, Configuration nutchConf, CommonCrawlConfig config, ParseData parseData)
-