I Using saleforce read snap to read data and write to s3 in parquet format but getting error due to 'picklist' data type which is not compatible to parquet. How to handle such datat types problem

Hi Roger667 Could you please check your SF read snap and make sure the 'Match datatype' is checked.Thanks !

Unable to write salesforce data to s3 in parquet format | SnapLogic - Integration Nation

12 Replies

manichandana_ch
New Contributor III
3 years ago
Hi Roger667 ,
you need to enable the second output view of salesforce read snap (schema output) and get the metadata of the columns, convert them to the datatypes compatible with parquet writer and then pass the metadata to the second input view of parquet writer (enable second input view, schema input).
attaching sample pipeline and file used to convert the schema.

Thanks,
Mani Chandana Chalasani

Convert_SFDC_to_parquet.csv1 KB
SFDC_Parquet_Writer.zip4 KB
Roger667
New Contributor III
3 years ago
Hi manichandana_ch
Thank you for your response. I successfully executed the SFDC_Parquet_Writer pipeline; however, the reader snap becomes unresponsive after processing a maximum of 2049 documents. It continues to operate indefinitely.
I tried breaking it down and this issue comes only when i add the parquet writer at the end
Roger667
New Contributor III
3 years ago
Hi manichandana_ch , SpiroTaleski
Is this issue related to salesforce read snap due to some api related issue or at the parquet writer's end?
- manichandana_ch
  New Contributor III
  3 years ago
  Hi Roger667
  please add a gate snap and JSON splitter to split the incoming data from output of gate snap. please remove the salesforce read mapper and connect the output of JSON splitter to parquet writer data input. In JSON splitter give this - jsonPath($, "input0[*]")
  Here is the screenshot of what changes are done. please connect 395 port to parquet writer data input.
  
  Issue is the design of pipeline. sf read snap is starting to process data records to parquet writer data input immediately but the schema is not getting processed from the sf read snap, seems like it will process the metadata after all the data is read and processed. But the data is not being processed further because parquet writer is not getting schema to start writing as it's waiting for schema details, it's like an interlock. when gate snap is added to the sf read data output gate snap, it accumulates all input data before it proceeds further so that before passing to parquet writer, data is accumulated at gate snap and then metadata is identified at schema output. Then it starts writing to parquet writer.
  Thanks & Regards,
  Mani Chandana Chalasani
  - Roger667
    New Contributor III
    3 years ago
    HI manichandana_ch
    Thanks for solving the buffer issue. This Worked but this led to another error. The parquet is not able to write boolean data types even though i have it in the excel file. It is identifying Boolean columns as string even tough i can see in the metadata that those columns are not string but boolean
Roger667
New Contributor III
3 years ago
Thank manichandana_ch . This solution worked for me. I have a question that may extend beyond the current topic. How should I design the pipeline to accommodate incremental loads?
- manichandana_ch
  New Contributor III
  3 years ago
  Hi Roger667
  After getting the data from SF read snap, you can probably take a copy and get the max value of lastmodified date or any audit date column and store it in a file or config table if any. Then, before the SF read, you can read the file/table and get the lastmodifieddate value. Then, in the SF read snap you can add a filter to get the records greater than the lastmodified date.
  This is the expression added in where clause of SF read, where $lastmodifieddate is coming from file/table before SF read snap. In case you prefer using file, you can store the file in local sldb or any location that is available to you.
  Hope this helps !
  Thanks!
Roger667
New Contributor III
3 years ago
Thanks manichandana_ch . Everything is running smoothly, except there's a minor, peculiar issue with the mapping CSV file. After the left join lookup mapper i am getting null values even though the data type and there corrersponding convert data type is present in the excel file. I moved those records from bottom to top of the excel sheet and then it worked fine. Any thoughts on how this is happening? And when i moved several records at the top i got another issue at parquet writer
- manichandana_ch
  New Contributor III
  3 years ago
  Hi Roger667
  I'm not sure of this behavior, I haven't faced this earlier and not able understand the issue. may be you can try copying the contents of the file in a new CSV and use it.
  mapping CSV file is an option to convert the metadata, you can use an expression file also to achieve this. I found this easier and used this. but you can also maintain these details in an expression file and call it in a mapper to get the parquet appropriate data types.
  Thanks !
  - Roger667
    New Contributor III
    3 years ago
    Hi manichandana_ch
    This error is comimg due to datetime data type in salesforce object which is mapped as bigint in the mapping sheet. But this mapping leads to loss of data and timestamp only comes as year. Can you suggest how to handle this datetime

Forum Discussion

Unable to write salesforce data to s3 in parquet format

12 Replies

Recent Discussions

trace API and proxy calls

Pagination Logic Fails After Migrating from REST GET to HTTP Client Snap

Pipeline Execute Pool size

Concat values of a field based on value of another field

JWT Configuration for SnapLogic Public API