Package org.apache.nutch.scoring.tld
Class TLDScoringFilter
- java.lang.Object
-
- org.apache.nutch.scoring.AbstractScoringFilter
-
- org.apache.nutch.scoring.tld.TLDScoringFilter
-
- All Implemented Interfaces:
Configurable,Pluggable,ScoringFilter
public class TLDScoringFilter extends AbstractScoringFilter
Scoring filter to boost top-level domains (TLDs).
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.scoring.ScoringFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description TLDScoringFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description floatindexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore)This method calculates a indexed document score/boost.-
Methods inherited from class org.apache.nutch.scoring.AbstractScoringFilter
distributeScoreToOutlinks, generatorSortValue, getConf, initialScore, injectedScore, passScoreAfterParsing, passScoreBeforeParsing, setConf, updateDbScore
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.nutch.scoring.ScoringFilter
orphanedScore
-
-
-
-
Method Detail
-
indexerScore
public float indexerScore(Text url, NutchDocument doc, CrawlDatum dbDatum, CrawlDatum fetchDatum, Parse parse, Inlinks inlinks, float initScore) throws ScoringFilterException
Description copied from interface:ScoringFilterThis method calculates a indexed document score/boost.- Specified by:
indexerScorein interfaceScoringFilter- Overrides:
indexerScorein classAbstractScoringFilter- Parameters:
url- url of the pagedoc- indexed document. NOTE: this already contains all information collected by indexing filters. Implementations may modify this instance, in order to store/remove some information.dbDatum- current page from CrawlDb. NOTE:- changes made to this instance are not persisted
- may be null if indexing is done without CrawlDb or if the segment is generated not from the CrawlDb (via FreeGenerator).
fetchDatum- datum from FetcherOutput (containing among others the fetching status)parse- parsing result. NOTE: changes made to this instance are not persisted.inlinks- current inlinks from LinkDb. NOTE: changes made to this instance are not persisted.initScore- initial boost value for the indexed document.- Returns:
- boost value for the indexed document. This value is passed as an argument to the next scoring filter in chain. NOTE: implementations may also express other scoring strategies by modifying the indexed document directly.
- Throws:
ScoringFilterException- if there is a fatal error whilst calculating the indexed document score/boost
-
-