Parquet Reader & Writer through azure S3

For saving data in parquet format in s3 , below is the pipeline configuration.

This pipeline creates meta data from the data itself , though it uses parquet data type ‘binary’ which is equivalent to string.

the first mapper converts the doc into string. we used the below arrow function into the 1st mapper:
$.mapValues((value, key) => value==null?“”:value.toString())

the second mapper function creates the meta from the data. The arrow function used for this:
$.keys().map(x=>{“col_name”:x,“data_type”:“binary”})

For reading the file back below is the configuration of parquet reader:

Please note “Use old data format” may be critical otherwise it may fail to read. This is to be checked when data are not nested.

pipeline

Forum Discussion

Parquet Reader & Writer through azure S3

Recent Discussions

Way to lock down in Prod org to "Monitor" only access?

trace API and proxy calls

Pagination Logic Fails After Migrating from REST GET to HTTP Client Snap

Pipeline Execute Pool size

Concat values of a field based on value of another field