cancel
Showing results for 
Search instead for 
Did you mean: 

Create Avro Dynamic Schema

Majid
New Contributor III

Hi Team,

I would like some recommendations/directions regarding possible solution to below requirement.

I have created a generic pipeline to read 800 different Mainframe VSAM binary files with different formats and parse them using associated copybooks, validate all the data fields against a specified regular expression and if the file passed validation I will like to create a target Avro file.

Avro formatter requires Avro schema predefined. As my pipelines are generic and can work with any VSAM files as long as you provide file location in S3 and associated copybook I would like to know if there is anyway to generate Avro schema dynamically based on the copybook structure or output of data after copybook parser.

Please note the source files have different records types with each record having different structure.

I would appreciate any directions.

2 REPLIES 2

ptaylor
Employee
Employee

If you generate a separate Avro schema for each data file, you could save the schema to a file on sldb, then invoke a child pipeline using Pipe Exec, setting a pipeline parameter to specify the Avro schema file’s path. In the Avro Formatter in the child pipeline, the schema setting would be an expression you’d set to the value of the pipeline parameter.

Majid
New Contributor III

Thank You @ptaylor … I am looking directions on ways to generate Avro schema dynamically using data file or copybook structure. Once the schema is generated and created in sldb I can sue above method to pass that to child pipeline so that Avro formatter can use it while creating Avro target file in S3.