Package org.apache.nutch.indexwriter.csv
Class CSVIndexWriter
- java.lang.Object
-
- org.apache.nutch.indexwriter.csv.CSVIndexWriter
-
- All Implemented Interfaces:
Configurable,IndexWriter,Pluggable
public class CSVIndexWriter extends Object implements IndexWriter
Write Nutch documents to a CSV file (comma separated values), i.e., dump index as CSV or tab-separated plain text table. Format (encoding, separators, etc.) is configurable by a couple of options, see output ofdescribe().Note: works only in local mode, to be used with index option
-noCommit.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected classCSVIndexWriter.Separatorrepresent separators (also quote and escape characters) as char(s) and byte(s) in the output encoding for efficiency.
-
Field Summary
Fields Modifier and Type Field Description protected FSDataOutputStreamcsvoutprotected Charsetencodingencoding of CSV file-
Fields inherited from interface org.apache.nutch.indexer.IndexWriter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description CSVIndexWriter()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()voidcommit()(nothing to commit)voiddelete(String key)(deletion of documents is not supported)Map<String,Map.Entry<String,Object>>describe()ReturnsMapwith the specific parameters the IndexWriter instance can take.ConfigurationgetConf()static voidmain(String[] args)voidopen(Configuration conf, String name)voidopen(IndexWriterParams parameters)Initializes the internal variables from a given index writer configuration.voidsetConf(Configuration conf)voidupdate(NutchDocument doc)voidwrite(NutchDocument doc)
-
-
-
Field Detail
-
encoding
protected Charset encoding
encoding of CSV file
-
csvout
protected FSDataOutputStream csvout
-
-
Method Detail
-
open
public void open(Configuration conf, String name) throws IOException
- Specified by:
openin interfaceIndexWriter- Parameters:
conf- Nutch configurationname- target name of theIndexWriterto be opened- Throws:
IOException- Some exception thrown by some writer.
-
open
public void open(IndexWriterParams parameters) throws IOException
Initializes the internal variables from a given index writer configuration.- Specified by:
openin interfaceIndexWriter- Parameters:
parameters- Params from the index writer configuration.- Throws:
IOException- Some exception thrown by writer.
-
write
public void write(NutchDocument doc) throws IOException
- Specified by:
writein interfaceIndexWriter- Throws:
IOException
-
delete
public void delete(String key)
(deletion of documents is not supported)- Specified by:
deletein interfaceIndexWriter
-
update
public void update(NutchDocument doc) throws IOException
- Specified by:
updatein interfaceIndexWriter- Throws:
IOException
-
close
public void close() throws IOException- Specified by:
closein interfaceIndexWriter- Throws:
IOException
-
commit
public void commit()
(nothing to commit)- Specified by:
commitin interfaceIndexWriter
-
getConf
public Configuration getConf()
- Specified by:
getConfin interfaceConfigurable
-
describe
public Map<String,Map.Entry<String,Object>> describe()
ReturnsMapwith the specific parameters the IndexWriter instance can take.- Specified by:
describein interfaceIndexWriter- Returns:
- The values of each row. It must have the form <KEY,<DESCRIPTION,VALUE>>.
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConfin interfaceConfigurable
-
-