cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unable to write salesforce data to s3 in parquet format

Roger667
New Contributor III

I Using saleforce read snap to read data and write to s3 in parquet format but getting error due to 'picklist' data type which is not compatible to parquet. How to handle such datat types problem

1 ACCEPTED SOLUTION

Hi @Roger667 

Could you please check your SF read snap and make sure the 'Match datatype' is checked.

manichandana_ch_0-1696409757505.png

Thanks !

View solution in original post

12 REPLIES 12

manichandana_ch
New Contributor III

Hi @Roger667 ,

you need to enable the second output view of salesforce read snap (schema output) and get the metadata of the columns, convert them to the datatypes compatible with parquet writer and then pass the metadata to the second input view of parquet writer (enable second input view, schema input).
attaching sample pipeline and file used to convert the schema.

 

Thanks,

Mani Chandana Chalasani

 

 

Roger667
New Contributor III

Hi @manichandana_ch 

Thank you for your response. I successfully executed the SFDC_Parquet_Writer pipeline; however, the reader snap becomes unresponsive after processing a maximum of 2049 documents. It continues to operate indefinitely.
I tried breaking it down and this issue comes only when i add the parquet writer at the end

 

Roger667_0-1696280367755.png

 

 

 

 

Roger667
New Contributor III

Hi @manichandana_ch , @SpiroTaleski 
Is this issue related to salesforce read snap due to some api related issue or at the parquet writer's end?

Hi @Roger667 

please add a gate snap and JSON splitter to split the incoming data from output of gate snap. please remove the salesforce read mapper and connect the output of JSON splitter to parquet writer data input. In JSON splitter give this - jsonPath($, "input0[*]")

Here is the screenshot of what changes are done. please connect 395 port to parquet writer data input.

manichandana_ch_0-1696322360683.png

 

 

Issue is the design of pipeline. sf read snap is starting to process data records to parquet writer data input immediately but the schema is not getting processed from the sf read snap, seems like it will process the metadata after all the data is read and processed. But the data is not being processed further because parquet writer is not getting schema to start writing as it's waiting for schema details, it's like an interlock. when gate snap is added to the sf read data output gate snap, it accumulates all input data before it proceeds further so that before passing to parquet writer, data is accumulated at gate snap and then metadata is identified at schema output. Then it starts writing to parquet writer.

Thanks & Regards,

Mani Chandana Chalasani