Class NodeDumper
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.nutch.scoring.webgraph.NodeDumper
-
- All Implemented Interfaces:
Configurable,Tool
public class NodeDumper extends Configured implements Tool
A tools that dumps out the top urls by number of inlinks, number of outlinks, or by score, to a text file. One of the major uses of this tool is to check the top scoring urls of a link analysis program such as LinkRank. For number of inlinks or number of outlinks the WebGraph program will need to have been run. For link analysis score a program such as LinkRank will need to have been run which updates the NodeDb of the WebGraph.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classNodeDumper.DumperOutputs the hosts or domains with an associated value.static classNodeDumper.SorterOutputs the top urls sorted in descending order.
-
Constructor Summary
Constructors Constructor Description NodeDumper()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voiddumpNodes(Path webGraphDb, org.apache.nutch.scoring.webgraph.NodeDumper.DumpType type, long topN, Path output, boolean asEff, org.apache.nutch.scoring.webgraph.NodeDumper.NameType nameType, org.apache.nutch.scoring.webgraph.NodeDumper.AggrType aggrType, boolean asSequenceFile)Runs the process to dump the top urls out to a text file.static voidmain(String[] args)intrun(String[] args)Runs the node dumper tool.-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
-
-
-
Method Detail
-
dumpNodes
public void dumpNodes(Path webGraphDb, org.apache.nutch.scoring.webgraph.NodeDumper.DumpType type, long topN, Path output, boolean asEff, org.apache.nutch.scoring.webgraph.NodeDumper.NameType nameType, org.apache.nutch.scoring.webgraph.NodeDumper.AggrType aggrType, boolean asSequenceFile) throws Exception
Runs the process to dump the top urls out to a text file.- Parameters:
webGraphDb- TheWebGraphfrom which to pull values.type- the node property type to dump, one ofNodeDumper.DumpType.INLINKS,NodeDumper.DumpType.OUTLINKSorNodeDumper.DumpType.SCOREStopN- maximum value of top links to dumpoutput- aPathto write output toasEff- if true set equals-sign as separator for Solr's ExternalFileField, false otherwisenameType- eitherNodeDumper.NameType.HOSTorNodeDumper.NameType.DOMAINaggrType- the aggregation type, eitherNodeDumper.AggrType.MAXorNodeDumper.AggrType.SUMasSequenceFile- true output will be written asSequenceFileOutputFormat, otherwise defaultTextOutputFormat- Throws:
Exception- If an error occurs while dumping the top values.
-
-