Forum Discussion

krish_pwc's avatar
krish_pwc
New Contributor II
7 years ago

Parquet Reader & Writer through azure S3

For saving data in parquet format in s3 , below is the pipeline configuration.

This pipeline creates meta data from the data itself , though it uses parquet data type ‘binary’ which is equivalent to string.

the first mapper converts the doc into string. we used the below arrow function into the 1st mapper:
$.mapValues((value, key) => value==null?“”:value.toString())

the second mapper function creates the meta from the data. The arrow function used for this:
$.keys().map(x=>{“col_name”:x,“data_type”:“binary”})

For reading the file back below is the configuration of parquet reader:

Please note “Use old data format” may be critical otherwise it may fail to read. This is to be checked when data are not nested.

No RepliesBe the first to reply