public class LogFormatLoader
extends nl.basjes.pig.input.apachehttpdlog.Loader
-- Specify any existing file as long as it exists.
-- It won't be read by the loader when no fields are requested.
Example =
LOAD 'test.pig'
USING org.apache.pig.piggybank.storage.apachelog.LogFormatLoader(
'%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"'
);
DUMP Example;
The output of this command is a (huge) example (yes actual pig code) which demonstrates
how all possible fields can be extracted. In normal use cases this example will be trimmed
down to request only the fields your application really needs.
This loader implements pushdown projection so there is no need to worry too much about the
fields you leave in.
This loader supports extracting things like an individual cookie or query string parameter
regardless of the position it has in the actual log line.
In addition to the logformat specification used in your custom config this parser also
understands the standard formats:
common
combined
combinedio
referer
agent
So this works also:
Example =
LOAD 'test.pig'
USING org.apache.pig.piggybank.storage.apachelog.LogFormatLoader('common');
DUMP Example;
This class is simply a wrapper around https://github.com/nielsbasjes/logparser so more detailed documentation can be found there.LoadPushDown.OperatorSet, LoadPushDown.RequiredField, LoadPushDown.RequiredFieldList, LoadPushDown.RequiredFieldResponse| Constructor and Description |
|---|
LogFormatLoader(String... parameters) |
getAdditionalDissectors, getFeatures, getInputFormat, getLogformat, getNext, getPartitionKeys, getRequestedFields, getSchema, getStatistics, getTypeRemappings, prepareToRead, pushProjection, setLocation, setPartitionFilter, setUDFContextSignaturegetAbsolutePath, getCacheFiles, getLoadCaster, getPathStrings, getShipFiles, join, relativeToAbsolutePath, warnpublic LogFormatLoader(String... parameters)
Copyright © 2007-2017 The Apache Software Foundation