Distinct results while saving/validating pipeline vs executing pipeline

Hi Team,

Lately, I observed distinct results getting generated during compile time (i.e. saving a pipeline) vs run time (i.e. executing a pipeline). Has anyone observed this behavior?

For instance, consider a simple example of reading a flat file, doing some transformations, using joins and exporting it as a csv/excel file. When we save the pipeline and the Snap Execution has been set to Validate & Execute, a flat file will be generated in Manager with default 50 records (this may be ranging from 1-2000 depending on each user’s preview count settings). The file generated during saving a pipeline is different than one generated after executing a pipeline.

What I observed was, the joins didn’t work during compile time whereas they worked during run time.

My use case doesn’t need to export a file but to update new records in Salesforce, what’s happening over here is that, during compile time, a vital field data is going blank and while running the pipeline, the data is going as is. The concern here is "As I'm considering brand new records ONLY ", compile time execution ingests wrong data and run time wouldn’t be able to ingest the correct data as the record was already ingested (new fields only, run time wouldn’t fetch that data as it’s no longer a new record)

Any help/thoughts on this would be highly appreciated.
Thanks!

Best Regards,
Darsh

Tagging the experts to get some help on this one.

CC: @robin @koryknick @dmiller @bojanvelevski @ptaylor

I defer to the developers on this one. I avoid Validate and Execute.

1 Like

Sorry, I’m finding this a very confusing post.

Is that related to the paragraph before it or the one after it? I’m not seeing anything else about joins.

What do you mean by “going blank”? What’s supplying that data?

We need more context to make sense of this. What’s a brand new record? From where? It sounds like you might want to change the snap that updates Salesforce to Execute only so that validation doesn’t perform updates that you only want to happen during execution.

Please use “validate” and “execute”, not “compile” and “run”. There’s nothing in SnapLogic that’s really equivalent to “compiling” so that’s a confusing term.

Sincere apologies @ptaylor for the confusion. How I understand the 3 buttons on snapLogic is as below:

  1. Validating a pipeline: Compile time
  2. Executing a pipeline: Run time
  3. Saving a pipeline with the last snap having snap execution as “Validate & Execute”: Compile time

I will not use Compile and Run time to avoid confusion henceforth. Let me explain again what the issue is and what I’m trying to achieve.

Upstream system: Snowflake
Downstream system: Salesforce

Issue: Data is different in output preview of a snap when the pipeline is saved or validated. I did a sanity test and the ID record (i.e. col A) that should have some data (i.e. in col H) is coming as “Null” values. This data that I’m expecting is coming from Joins though.

The above behavior is not the same when the pipeline is executed. With execution in place, the data comes as expected.

Definition of brand new records: With the help of Inner Join, I’m considering IDs from snowflake that are not in Salesforce and I’m ingesting those NEW records to Salesforce with this pipeline.

Salesforce Create and Salesforce Update doesn’t work efficiently (from what I’ve observed) so I’ve been using Salesforce Upsert for this operation.

The concerning piece is that, the last snap of my pipeline is a Salesforce Upsert with snap execution as “Validate & Execute”; whenever I make some minute changes and save my pipeline, those changes flow to downstream system (i.e. Salesforce) and that’s an expected behavior however with Saving/Validating pipeline, the data is NOT consistent for the joins (as explained before) and this inserts the records (if new record) or updates the record (if existing record) as we are using an Upsert.

When the pipeline is executed, the same record won’t be inserted as it’s no longer a NEW record and it won’t be updated too as that record hasn’t received any update from the upstream systems. This is basically the data snapLogic should have calculated during saving/validating pipeline but it wasn’t able to!

What I’m trying to achieve: Consistent data flow from Snowflake to Salesforce (with the help of Joins as those would be needed anyhow)

Happy to clarify further questions if any!

Solution: For now, I’ve already changed my Salesforce Upsert snap’s settings to “Execute ONLY” so that even when I save/validate my pipeline, those updates don’t flow downstream. I would have expected joins to work in all the different scenarios like save/validate/execute. Is this a limitation of the tool or am I doing something wrong over here? I wouldn’t be surprised if I’m missing a crucial step, I’m still learning snapLogic so by all means, feel free to give me right directions (I wouldn’t be offended).

Apologies again for the confusion and looking forward to your thoughts on this one.

Best Regards,
Darsh

Remember that a Join will only join the data that’s available from its input views – the output of the upstream snaps attached to its inputs. Those data sets are also constrained by the Preview Document Count if you’re validating. So a Join that might normally find many matches when the complete data is available during an execution might find no matches when it’s only dealing with a 50 record subset from each input view. I think this might not be obvious. It’s easy to get the misimpression that the Preview Document Count is only a limit on the amount of data that we’re displaying but all the rest of the data is still processed. That’s not the case. When a snap reaches that count on its output, it actually stops and doesn’t output any more data, even invisibly, even if there’s a lot more data than that available. This is important to understand.

Could that explain what you’re seeing?

It does a bit however I specified before that during validating/saving a pipeline, say for instance, ID = 100 will display Phone Number as “Null” and during pipeline execution, it will display Phone Number as 123456.

As I had access to upstream systems, I knew ID=100 has a phone number 123456 (this is achieved via Join in snapLogic) but I don’t get this while validating my pipeline. If ID=100 wasn’t available in the output preview, I would have been fine with that considering it didn’t come in the first 50 records however it shows ID=100 and Phone Number = “Null”; this is concerning for me!

Moreover, as the Join didn’t work as expected, ID=100 inserted a new record in Salesforce (as ID=100 was a new record) with Phone number = “Null” (when the pipeline was validated as Salesforce Upsert had its snap execution set to Validate & Execute) and this is a weird behavior in my honest opinion. The same has been observed while saving a pipeline if snap execution is Validate & Execute.

I wouldn’t have realized this as I wasn’t checking the output preview but when I validated the records in Salesforce, I was seeing a lot of “Null” values, that made me go to the upstream system, randomly pick up 10 IDs and check their values. To my surprise, those IDs had some data and still were going as Null. After investigating, I came to a conclusion that minute changes in the pipeline and then saving it is causing the issue, thus I quickly disabled the salesforce upsert snap, made all the relevant changes, changed snap execution of Salesforce Upsert to Execute ONLY from Validate & Execute. I still disable the Salesforce Upsert whenever the requirement changes and I have to modify my pipeline as I feel that Joins are not functioning during validating/saving pipeline but they do during executing pipeline (which is a bit absurd)

So in the output of the Join, the Phone Number you’re expecting is null. Do you see the correct value of this Phone Number in one of the inputs to the Join? The preview will show you all the input data so you be able to see what should be correctly joined. Can you see specific input records that should have been joined but weren’t?

Yes, that’s correct. The phone number I’m expecting for an ID (based on the data in the upstream systems) comes as NULL. This behavior is intermittent, at times, output preview of Join will fetch the expected data, other times it won’t. As there are multiple joins on my pipeline, the final mapper where I rename the cols to ingest into Salesforce doesn’t include the fetched data from Join even though Output preview of Join displayed something.

Yes, I do see the correct value of Phone Number in one of the inputs to the Join.

Yes, I do see specific input records that should have been joined but outputting Null values. Reiterating this, this happens in saving/validating pipeline NOT with pipeline execution. With pipeline execution, it works as expected.

As the phone number value from Joins are coming out as Null during saving/validating pipeline, it ingests (inputs) those to the next snap i.e. Salesforce Upsert (as the snap execution was set to Validate & Execute)

Can you please show us some screenshots? I’d like to see screenshots showing:

  • The pipeline.

  • The configuration settings of the Join.

  • The Pipeline Validation Statistics, particularly the Join.

  • The Pipeline Execution Statistics, particularly the Join.

  • Ideally, some preview data for the inputs and output of the Join in a case where it didn’t work correctly, at least showing a particular pair of records that should have been joined correctly and the output record where you’re seeing nulls instead of the values from the input records.

Also, please Download the actual data from these previews and save the json. There may be a subtle difference in the data type of the values for the field that you’re joining on that may become more evident if we can see the JSON (e.g. maybe it’s the string “100” in one and the number 100 in the other).

Sure @ptaylor, I can share some snaps but it will take some time as I have to replicate the pipeline and disable the Salesforce snaps. Reason being; we have a workaround as mentioned earlier on changing the snap execution to “Execute Only” and just executing the pipeline instead of saving/validating pipeline. Also, currently we are testing it and dev has been marked complete so I will have to branch out the pipeline in such a way that it doesn’t affect the ongoing testing and dev that has been marked complete.

You had a valid point on datatype; where one could be a number and other would be a string, I double checked this morning as I wasn’t confident whether it was string vs number for this pipeline and the data type is string so we are covered in there. I’ll share the snaps as soon as I’m able to.

Appreciate your patience and help on this one.

@ptaylor, I have observed this behavior again where saving a pipeline even though the Salesforce snaps’ settings were “Execute Only”, ingested Null values to Salesforce.

[Q-1]: Would Salesforce upsert snap insert/update records while saving a pipeline even though its snap execution has been set to “Execute Only”

[Q-2]: Why would Joins output Null data when we clearly know that the data IS NOT NULL? Again this happened during saving a pipeline, I forgot to disable the salesforce upsert snap while making a minute change in other snap and 2k records have been affected now :frowning: