Class LinkDumper
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.nutch.scoring.webgraph.LinkDumper
-
- All Implemented Interfaces:
Configurable,Tool
public class LinkDumper extends Configured implements Tool
The LinkDumper tool creates a database of node to inlink information that can be read using the nested Reader class. This allows the inlink and scoring state of a single url to be reviewed quickly to determine why a given url is ranking a certain way. This tool is to be used with the LinkRank analysis.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classLinkDumper.InverterInverts outlinks from the WebGraph to inlinks and attaches node information.static classLinkDumper.LinkNodeBean class which holds url to node information.static classLinkDumper.LinkNodesWritable class which holds an array of LinkNode objects.static classLinkDumper.MergerMerges LinkNode objects into a single array value per url.static classLinkDumper.ReaderReader class which will print out the url and all of its inlinks to system out.
-
Constructor Summary
Constructors Constructor Description LinkDumper()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voiddumpLinks(Path webGraphDb)Runs the inverter and merger jobs of the LinkDumper tool to create the url to inlink node database.static voidmain(String[] args)intrun(String[] args)Runs the LinkDumper tool.-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
-
-
-
Field Detail
-
DUMP_DIR
public static final String DUMP_DIR
- See Also:
- Constant Field Values
-
-
Method Detail
-
dumpLinks
public void dumpLinks(Path webGraphDb) throws IOException, InterruptedException, ClassNotFoundException
Runs the inverter and merger jobs of the LinkDumper tool to create the url to inlink node database.- Parameters:
webGraphDb- thePathto the output ofWebGraph.createWebGraph(Path, Path[], boolean, boolean)- Throws:
IOException- if there is a fatal I/O issue at runtimeInterruptedException- if the Job is interrupted during executionClassNotFoundException- if classes required to run the Job cannot be located
-
-