Difference between revisions of "Flat File Parser"

From MidrangeWiki
Jump to: navigation, search
m
Line 159: Line 159:
 
==== GroupingLineProcessor ====
 
==== GroupingLineProcessor ====
 
If the data for one transaction are composed of multiple lines these have to be buffered at some point. This can be achieved with a grouping line processor. It buffers the data until the end of one transaction has been reached. The borders of a transaction are defined by line formats, see <code>ffp_processor_grouping_setStartLineFormat</code> and <code>ffp_processor_grouping_setEndLineFormat</code>.
 
If the data for one transaction are composed of multiple lines these have to be buffered at some point. This can be achieved with a grouping line processor. It buffers the data until the end of one transaction has been reached. The borders of a transaction are defined by line formats, see <code>ffp_processor_grouping_setStartLineFormat</code> and <code>ffp_processor_grouping_setEndLineFormat</code>.
 +
 +
The buffered/grouped data is then passed to the processing execution program (or executor, see ffp_processor_grouping_setExecutor). The executor will get a logger instance and the data (with line formats, see data structure <code>ffp_processor_grouping_line</code>).
 +
 +
The executor must implement the following parameters:
 +
* logger - pointer (const)
 +
* data - pointer (const)
 +
 +
Data is a linked list where each entry is a data structure in the format <code>ffp_processor_grouping_line</code>.
  
 
TODO ILEDocs FFPPRCG
 
TODO ILEDocs FFPPRCG

Revision as of 08:58, 2 January 2011


Purpose

The service programs of this projects helps the developer to more easily handle text files with fixed positional content. This library is not about just copying data from file a to file b like CPYF.

The project was much inspired by the open source project Java Flat File Parser library on Sourceforge.net.

Features

  • validation support
  • error handling
  • logging
  • various input providers
  • transaction handling
  • configurable modules
  • easy configuration
  • extendibility


Dependencies

This software depends on the following programs

  • Linked List
  • Linked List Utilities
  • Parameter Evaluation
  • Linked Map
  • Messages
  • Reflection

Optional dependencies are

  • Log4RPG


Usage

The main module is the parser. Everything else will be plugged into the parser. So first a parser instance will be created.

 parser = ffp_create();

The next step is to determine where the data (the input) will come from. In this case it will come from a stream file on the IFS.

 filePath = '/public/share/input_file.txt' + x'00'; // the path must be null terminated
 inputProvider = ffp_input_stmf_create(%addr(filepath));
 ffp_setInputProvider(parser : inputProvider);

Now configure what line formats are to be expected.

 ffp_addLineFormatFromImage(parser : '100@@@@@@@@@@'); // 100 fixed and next 10 chars are variable content
 

So far the parser won't do anything with the data. We haven't yet configured what program should process the data. We will use the echo line processor. It will output every input to the screen.

 ffp_addLineProcessor(parser : ffp_processor_echo_create());

The parser will start with the ffp_parse procedure.

 ffp_parse(parser);

Clean up. All allocated resources must be free.

 ffp_finalize(parser);


Modules

Parser

The parser module is the main module in this software. It connects every other module and the API can be used to create and configure a parser instance.

TODO ILEDocs FFPARSE link

Line Formats

For every expected data constellation a line format must be registered at the parser. A line format is like a template which will be put over the data to see if it fits. The incoming data must at least match one registered line format. If no line format matches the data the data will be flagged as invalid and depending on the configuration the parsing will end here.

Line formats can be a fixed value, a variable value or a mix of fixed and variable values. The variable part can be configured as an alphanumeric or numeric place holder.

Note: If more that one line format matches a data line the first line format will be used.

Fixed Line Format

If a data unit has always the same fixed content then a fixed line format can be used to define this fixed format, see ffp_addConstantLineFormat.

Variable Line Format

If a data unit can some variable content then a variable line format is used. Variable line formats may have constants parts in it.

Numeric Variable

Variable numeric characters are defined with a #.

Alphanumeric Variable

Variable alphanumeric characters are defined with a @.

Examples:

 100##########
 200###@@@@@@@@@@@@@@@@@@@@
 @@@@@@@@@@@@@@@@@@@@@@@@@q@@@@@@@@@q


Fields

The line format can be split into various parts (fields), see ffp_addLineFormatField. Fields can be defined for every line format. Fields can only be defined in sequence (one after another). There cannot be a gap between to fields.

The procedure ffp_util_getLineFormatFieldData returns the data of the passed field from the current data unit.


Validators

The matching of a data unit to a line format does not always mean that the data is correct as it is. Validators are there to further check the correctness of the data. A validator gets the data of a specific field to check it. It checks and returns if it is valid or not. The validator only knows about the field to check. It does not know any other data from that data unit. It cannot make any decisions depending on other data from that data unit.

It also cannot change any data.

One or more validators can be defined per field up to a maximum of 99 validators for one line format.

Invalid data leads to an abortion of the current transaction (depending on the configuration).

User defined data can be passed on validator creation which is available on each data validation.

TODO ILEDocs FFPVAL

DateValidator

The DateValidator checks if the data from a field is a valid date. The date format to be used can be passed on the validator creation. All native date formats are supported (f. e. *DMY, *EUR, *ISO, *DMY0, ...). If no date format is passed the *ISO date format is used.

TODO ILEDocs FFPDATE

RangeValidator

The Rangevalidator checks if the data from a field is in range of values. The range must be passed on the validator creation. Only numeric ranges are supported and can be specified with two integers or two decimals. The upper and the lower limit must be separated by a space. Up to 10 decimal positions are supported for decimal values.

TODO ILEDocs FFPRNGV

EanValidator

The EanValidator checks if the data is a valid EAN. It checks if the check digit is valid for this EAN.

TODO ILEDocs FFPEAN TODO implementation


Input Providers

The parser does not only processes stream files. It does not even specify where the data comes from. That is handled by another module, the input provider (or its implementations). The input provider module itself also does not specify where the data comes from. It acts as a link between the parser and the real implementation where the data comes from. It forwards all read requests to the real implementation. How the real data source is read is to no concern of the parser or the input provider module but is solely handled by the input provider implementations.

TODO ILEDocs FFPIN link

Stream File Input Provider

The stream file input provider is an implementation of the input provider. It takes a file path as a parameter on the create procedure and reads the data from the file line by line. The line end character can either by CRLF or LF.

TODO ILEDocs link

Database Input Provider

The database input provider is an implementation of the input provider. It reads the data from the database table FFPMDE and returns the data record by record.

TODO ILEDocs link


Line Processors

The real processing of the data is performed by the line processors. The main module (PARSER) passed the data to the line processors.

Normally one line processor is registered at the parser though multiple line processors can be registered at the parser. A data unit will be passed to all registered line processors for processing.

TODO ILEDocs FFPPRC

EchoLineProcessor

This implementation displays all data on the console together with the matching line format id and data length.

TODO ILEDocs FFPPRCE

GroupingLineProcessor

If the data for one transaction are composed of multiple lines these have to be buffered at some point. This can be achieved with a grouping line processor. It buffers the data until the end of one transaction has been reached. The borders of a transaction are defined by line formats, see ffp_processor_grouping_setStartLineFormat and ffp_processor_grouping_setEndLineFormat.

The buffered/grouped data is then passed to the processing execution program (or executor, see ffp_processor_grouping_setExecutor). The executor will get a logger instance and the data (with line formats, see data structure ffp_processor_grouping_line).

The executor must implement the following parameters:

  • logger - pointer (const)
  • data - pointer (const)

Data is a linked list where each entry is a data structure in the format ffp_processor_grouping_line.

TODO ILEDocs FFPPRCG

Logger

Parser, line processor and validators can send message which should be logged. There is no default mechanism to log messages. A log message is composed of the message content, extra data and a log level.

Note: The logging of an error message does not abort any transaction. It does just what it does, logging.

LogLevel

The current log levels are

  • 1 = Error
  • 10 = Info
  • 20 = Debug

TODO ILEDocs FFPLOG

Note: The parser can run even without a configured logger. Though no messages will be displayed (besides the system messages).

DatabaseLogger

The default implementation logs all messages to a database file, FFPLOG.

Log4RPG Logger

This implementation uses the Log4RPG Framework to do the logging.

TODO implemenation

JobLogLogger

This implementation outputs all log messages to the job log (via the MESSAGE service program). All messages are of type *INFO. If data was added to the logger call then it will also be added to the job log message.

You can set the logger with the standard FFP API ffp_setLogger.

 ffp_setLogger(parser : ffp_logger_joblog_create());

Serializer

A serializer saves the data to a data source so that it can be later reviewed or corrected to be processed again. Normally only invalid data will be serialized. The serializer will be used if the a data unit does not match any line format or if the data cannot be validated successfully by the configured validators. The serializer will also be called on any errors during the call to the line processor which are not handled.

If serialization should not be done by the parser but by the line processors the switch processInvalidLines can be set to *on. This will make the parser to pass invalid data to the line processors with an invalidation flag. This is useful if the data is to be buffered by a grouping line processor.

TODO ILEDocs FFPSER

DatabaseSerializer

This implementation saves all data of a transaction into the database file FFPTRAN. Each transaction will get a unique id from the SQL sequence FFPSERD.

TODO ILEDocs FFPSERD


ConfigurationProvider

A parser instance can be much more easily created by the use of a configuration provider. For this to work a configuration provider has to be created and registered at the parser instance, see ffp_setConfigurationProvider. The ffp_loadConfiguration procedure configures the parser instance with the use of the public API of the parser.

XmlConfigurationProvider

This implementation reads a XML file and configures the parser according to it. The XML file must correspond to the DTD ffp.dtd. TODO link

Example:

 <?xml version="1.0" ?>
 <!DOCTYPE config SYSTEM "ffp.dtd" >
 <config
    ignoreBlankLines="yes"
    processInvalidLines="yes">
 
   <lineFormat name="mdeid" type="var" imageString="100#######" >
     <field length="3" />
     <field length="7" />
   </lineFormat>
   <lineFormat name="item" type="var" imageString="210@@@@@@@@@@@@@" >
     <field length="3" />
     <field length="13">
       <validator serviceprogram="FFPEAN" procedure="ffp_validator_ean_create" />
     </field> 
   </lineFormat>
   <lineFormat name="amount" type="var" imageString="220@@@@@@@" >
     <field length="3" />
     <field length="7">
       <validator serviceprogram="FFPRNGV" procedure="ffp_validator_range_create"
          parameter="0 999999"/>
     </field>
   </lineFormat>
   <lineFormat name="end" type="const" imageString="99" />
 
   <inputProvider serviceprogram="FFPINDB" procedure="ffp_input_db_create" />
 
   <logger serviceprogram="FFPLOGD" procedure="ffp_logger_db_create" />
 
 </config>


Documentation

ILEDocs is used to create an API documentation about the various modules. It can be view at iledocs.sourceforge.net/docs/ .


Download

The software is currently under development on hardware made available by the open source initiative of idevcloud.com. Future release can be downloaded from http://www.rpgnextgen.com.


Examples

License

This software is released under the GNU General Public License v3.0.


Links