public class IndexedStorage extends PigStorage implements IndexableLoadFunc
IndexedStorage is a form of PigStorage that supports a
per record seek. IndexedStorage creates a separate (hidden) index file for
every data file that is written. The format of the index file is:
| Header | | Index Body | | Footer |The Header contains the list of record indices (field numbers) that represent index keys. The Index Body contains a
Tuple for each record in the data.
The fields of the Tuple are:
Tuple Tuple in the index. Tuple in the index. IndexStorage implements IndexableLoadFunc and
can be used as the 'right table' in a PIG 'merge' or 'merge-sparse' join.
IndexStorage does not require the data to be globally partitioned & sorted
by index keys. Each partition (separate index) must be locally sorted.
Also note IndexStorage is a loader to demonstrate "merge-sparse" join.| Modifier and Type | Class and Description |
|---|---|
static class |
IndexedStorage.IndexedStorageInputFormat
Internal InputFormat class
|
static class |
IndexedStorage.IndexedStorageOutputFormat
Internal OutputFormat class
|
static class |
IndexedStorage.IndexManager
IndexManager manages the index file (both writing and reading)
It keeps track of the last index read during reading. |
LoadPushDown.OperatorSet, LoadPushDown.RequiredField, LoadPushDown.RequiredFieldList, LoadPushDown.RequiredFieldResponse| Modifier and Type | Field and Description |
|---|---|
protected int |
currentReaderIndexStart
Index into the the list of readers to the current reader.
|
protected byte |
fieldDelimiter
Delimiter to use between fields
|
protected int[] |
offsetsToIndexKeys
Offsets to index keys in tuple
|
protected Comparator<IndexedStorage.IndexedStorageInputFormat.IndexedStorageRecordReader> |
readerComparator
Comparator used to compare key tuples.
|
protected IndexedStorage.IndexedStorageInputFormat.IndexedStorageRecordReader[] |
readers
List of record readers.
|
caster, in, mLog, mRequiredColumns, schema, signature, writer| Constructor and Description |
|---|
IndexedStorage(String delimiter,
String offsetsToIndexKeys)
Constructs a Pig Storer that uses specified regex as a field delimiter.
|
| Modifier and Type | Method and Description |
|---|---|
void |
close()
A method called by the Pig runtime to give an opportunity
for implementations to perform cleanup actions like closing
the underlying input stream.
|
org.apache.hadoop.mapreduce.InputFormat |
getInputFormat()
This will be called during planning on the front end.
|
Tuple |
getNext()
Retrieves the next tuple to be processed.
|
org.apache.hadoop.mapreduce.OutputFormat |
getOutputFormat()
Return the OutputFormat associated with StoreFuncInterface.
|
void |
initialize(org.apache.hadoop.conf.Configuration conf)
IndexableLoadFunc interface implementation
|
void |
seekNear(Tuple keys)
This method is called by the Pig runtime to indicate
to the LoadFunc to position its underlying input stream
near the keys supplied as the argument.
|
checkSchema, cleanupOnFailure, cleanupOnSuccess, cleanupOutput, equals, equals, getFeatures, getPartitionKeys, getSchema, getStatistics, hashCode, prepareToRead, prepareToWrite, pushProjection, putNext, readField, relToAbsPathForStoreLocation, setLocation, setPartitionFilter, setStoreFuncUDFContextSignature, setStoreLocation, setUDFContextSignature, shouldOverwrite, storeSchema, storeStatisticsgetSplitComparablegetAbsolutePath, getCacheFiles, getLoadCaster, getPathStrings, getShipFiles, join, relativeToAbsolutePath, warnprotected IndexedStorage.IndexedStorageInputFormat.IndexedStorageRecordReader[] readers
protected int currentReaderIndexStart
protected byte fieldDelimiter
protected final int[] offsetsToIndexKeys
protected Comparator<IndexedStorage.IndexedStorageInputFormat.IndexedStorageRecordReader> readerComparator
public org.apache.hadoop.mapreduce.OutputFormat getOutputFormat()
StoreFuncInterfacegetOutputFormat in interface StoreFuncInterfacegetOutputFormat in class PigStorageOutputFormat associated with StoreFuncInterfacepublic org.apache.hadoop.mapreduce.InputFormat getInputFormat()
LoadFuncgetInputFormat in class PigStoragepublic Tuple getNext() throws IOException
LoadFuncgetNext in class PigStorageIOException - if there is an exception while retrieving the next
tuplepublic void initialize(org.apache.hadoop.conf.Configuration conf)
throws IOException
initialize in interface IndexableLoadFuncconf - The job configuration objectIOExceptionpublic void seekNear(Tuple keys) throws IOException
IndexableLoadFuncseekNear in interface IndexableLoadFunckeys - Tuple with join keys (which are a prefix of the sort
keys of the input data). For example if the data is sorted on
columns in position 2,4,5 any of the following Tuples are
valid as an argument value:
(fieldAt(2))
(fieldAt(2), fieldAt(4))
(fieldAt(2), fieldAt(4), fieldAt(5))
The following are some invalid cases:
(fieldAt(4))
(fieldAt(2), fieldAt(5))
(fieldAt(4), fieldAt(5))IOException - When the loadFunc is unable to position
to the required point in its input streampublic void close()
throws IOException
IndexableLoadFuncclose in interface IndexableLoadFuncIOException - if the loadfunc is unable to perform
its close actions.Copyright © 2007-2017 The Apache Software Foundation