Package org.apache.nutch.segment
Class SegmentReader
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.nutch.segment.SegmentReader
-
- All Implemented Interfaces:
Configurable,Tool
public class SegmentReader extends Configured implements Tool
Dump the content of a segment.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classSegmentReader.InputCompatMapperstatic classSegmentReader.InputCompatReducerstatic classSegmentReader.SegmentReaderStatsstatic classSegmentReader.TextOutputFormatImplements a text output format
-
Constructor Summary
Constructors Constructor Description SegmentReader()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voiddump(Path segment, Path output)voidget(Path segment, Text key, Writer writer, Map<String,List<Writable>> results)static CharsetgetCharset(Metadata parseMeta)Try to get HTML encoding from parse metadata.voidgetStats(Path segment, SegmentReader.SegmentReaderStats stats)voidlist(List<Path> dirs, Writer writer)static voidmain(String[] args)intrun(String[] args)-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
-
-
-
Method Detail
-
dump
public void dump(Path segment, Path output) throws IOException, InterruptedException, ClassNotFoundException
-
get
public void get(Path segment, Text key, Writer writer, Map<String,List<Writable>> results) throws Exception
- Throws:
Exception
-
getCharset
public static Charset getCharset(Metadata parseMeta)
Try to get HTML encoding from parse metadata. TryNutch.CHAR_ENCODING_FOR_CONVERSION, thenHttpHeaders.CONTENT_ENCODINGthen fallbackStandardCharsets.UTF_8
-
getStats
public void getStats(Path segment, SegmentReader.SegmentReaderStats stats) throws Exception
- Throws:
Exception
-
-