Package org.apache.nutch.tools.arc
Class ArcSegmentCreator
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.nutch.tools.arc.ArcSegmentCreator
-
- All Implemented Interfaces:
Configurable,Tool
public class ArcSegmentCreator extends Configured implements Tool
The
ArcSegmentCreatoris a replacement for fetcher that will take arc files as input and produce a nutch segment as output.Arc files are tars of compressed gzips which are produced by both the internet archive project and the grub distributed crawler project.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classArcSegmentCreator.ArcSegmentCreatorMapper
-
Field Summary
Fields Modifier and Type Field Description static StringURL_VERSION
-
Constructor Summary
Constructors Constructor Description ArcSegmentCreator()ArcSegmentCreator(Configuration conf)Constructor that sets the job configuration.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()voidcreateSegments(Path arcFiles, Path segmentsOutDir)Creates the arc files to segments job.static StringgenerateSegmentName()Generates a random name for the segments.static voidmain(String[] args)intrun(String[] args)-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
-
-
-
Field Detail
-
URL_VERSION
public static final String URL_VERSION
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
ArcSegmentCreator
public ArcSegmentCreator()
-
ArcSegmentCreator
public ArcSegmentCreator(Configuration conf)
Constructor that sets the job configuration.- Parameters:
conf- a populatedConfiguration
-
-
Method Detail
-
generateSegmentName
public static String generateSegmentName()
Generates a random name for the segments.- Returns:
- The generated segment name.
-
close
public void close()
-
createSegments
public void createSegments(Path arcFiles, Path segmentsOutDir) throws IOException, InterruptedException, ClassNotFoundException
Creates the arc files to segments job.- Parameters:
arcFiles- The path to the directory holding the arc filessegmentsOutDir- The output directory for writing the segments- Throws:
IOException- If an IO error occurs while running the job.InterruptedException- if thisJobis interruptedClassNotFoundException- if there is an error locating a class during runtime
-
-