Parquet Reader & Writer through azure S3

For saving data in parquet format in s3 , below is the pipeline configuration.


This pipeline creates meta data from the data itself , though it uses parquet data type ‘binary’ which is equivalent to string.

the first mapper converts the doc into string. we used the below arrow function into the 1st mapper:
$.mapValues((value, key) => value==null?"":value.toString())

the second mapper function creates the meta from the data. The arrow function used for this:
$.keys().map(x=>{“col_name”:x,“data_type”:“binary”})

For reading the file back below is the configuration of parquet reader:

image

Please note “Use old data format” may be critical otherwise it may fail to read. This is to be checked when data are not nested.

1 Like