Hi Team, The pipeline is taking too much time to execute, isn’t that strange when it takes around 40+ minutes on a single file reader snap? Snaps below for reference: CC: @dmiller @ptaylor @siwadon @koryknick

The trouble snap is the KAw&w/oAttachments - whatever that is doing, it’s the snap that is taking all of the execution time. Snaps waiting on Input and Output are generally not the issue since they aren’t really actively doing much.

That makes sense - realize that the Execute snap would be executed once for each input document - so in your screenshots, the statement is being executed 5,850 times.

@darshthakkar - Bring up the Pipeline Execution Statistics and stay on the Snap Statistics tab. Find the snap(s) that have the longest blue bar indicating a snap that is executing most of the time. This is usually the step that is taking the longest to complete. See the image below as an example; note the two snaps where the bar is almost completely blue (File Writer and Account Read) as opposed to the other two where there is almost no total execution time (Mapper-Adding…) and all green indicating that it’s waiting on input (Union): You can also hover over the Duration (colored) bar to get the details of how much time is spent in the snap for Input, Execution, and Output.

Yes - right there in Dashboard if you click on the “Completed” under the Status column, it will bring up the statistics for that execution.

If using Gate, make sure you change the “Output Document Content” from “All Input Documents” to either of the other options to prevent everything from going to memory. You could also use the Tail snap if you don’t need to send the input documents to the Snowflake Execute snap. I don’t know what you’re trying to do in this pipeline, but removing the input view from the Snowflake Execute would have it execute as soon as the pipeline starts, which might not be the behavior you want.

Too much time executing a pipeline | SnapLogic - Integration Nation

15 Replies

koryknick
Employee
4 years ago
The trouble snap is the KAw&w/oAttachments - whatever that is doing, it’s the snap that is taking all of the execution time. Snaps waiting on Input and Output are generally not the issue since they aren’t really actively doing much.
- darshthakkar
  Valued Contributor
  4 years ago
  That is a snowflake execute snap, it works well if there is no input view for snowflake execute however it takes a hell lot of time when the input view is enabled and something is getting ingested in that input view of snowflake execute.
koryknick
Employee
4 years ago
That makes sense - realize that the Execute snap would be executed once for each input document - so in your screenshots, the statement is being executed 5,850 times.
- darshthakkar
  Valued Contributor
  4 years ago
  
  koryknick:
  
  realize that the Execute snap would be executed once for each input document - so in your screenshots, the statement is being executed 5,850 times.
  
  That’s quite a lot, the best development practice would be to disable the input view for the snowflake execute and use an Inner Join instead; if no output preview is getting generated then plug in the inner join later on once the dev has finished.
  - ptaylor
    Employee
    4 years ago
    If you just want to execute the Snowflake Execute once and it depends on the previous snaps finishing but doesn’t actually use the input documents, consider using the Gate snap and note its options.
koryknick
Employee
4 years ago
@darshthakkar - Bring up the Pipeline Execution Statistics and stay on the Snap Statistics tab. Find the snap(s) that have the longest blue bar indicating a snap that is executing most of the time. This is usually the step that is taking the longest to complete. See the image below as an example; note the two snaps where the bar is almost completely blue (File Writer and Account Read) as opposed to the other two where there is almost no total execution time (Mapper-Adding…) and all green indicating that it’s waiting on input (Union):

You can also hover over the Duration (colored) bar to get the details of how much time is spent in the snap for Input, Execution, and Output.
- darshthakkar
  Valued Contributor
  4 years ago
  @koryknick: Thank you for your response.
  Is there a way I can go back to the snap statistics if I have run the same pipeline a couple of times after the execution I’m wanting to investigate? I would want to understand which snaps took the highest of time as you have suggested.
  
  As you can see from the screenshot below that even though the docs processed are low (i.e. 7392) it really took a lot of time:
  
  I was passing the output of file reader into a snowflake execute: this might have taken a lot of time and after it took ~1.5hrs, I decided to change my design to incorporate Inner Join (wasn’t planning to use this as there is no output preview when I use Inner join even for 2000 documents selected in the output preview settings) and then this was quick, first iteration to 13m and eventually 18/19 secs (I trimmed the records to 50 thus the low time)
koryknick
Employee
4 years ago
Yes - right there in Dashboard if you click on the “Completed” under the Status column, it will bring up the statistics for that execution.
- darshthakkar
  Valued Contributor
  4 years ago
  Thank you @koryknick for the direction.
  As you can see from the snap, it took a while on the file reader and excel parser, what would you think the reason behind it and how can I avoid such things in the future?
  
  Is it because the output of file reader was going in as an Input to Snowflake Execute?
koryknick
Employee
4 years ago
If using Gate, make sure you change the “Output Document Content” from “All Input Documents” to either of the other options to prevent everything from going to memory. You could also use the Tail snap if you don’t need to send the input documents to the Snowflake Execute snap.

I don’t know what you’re trying to do in this pipeline, but removing the input view from the Snowflake Execute would have it execute as soon as the pipeline starts, which might not be the behavior you want.
- ptaylor
  Employee
  4 years ago
  Yes, good points, Kory, especially about disabling the input view.
  
  Tail would probably make more sense than Gate if there’s only one input view. I couldn’t recall the name of that option in the Gate snap (even though I added it!) so just said “and note its options”. 🙂
koryknick
Employee
4 years ago
If you really want performance out of this, I would recommend that you load the ID values into another Snowflake table and run your complex query using that table of IDs as your filtering condition in the query itself. The load of IDs to that table can be done using the Snowflake Bulk Load snap and should be pretty quick - then the query using the Snowflake Execute only has to run once to return all data.
- darshthakkar
  Valued Contributor
  4 years ago
  
  koryknick:
  
  you load the ID values into another Snowflake table and run your complex query using that table of IDs as your filtering condition in the query itself. The load of IDs to that table can be done using the Snowflake Bulk Load snap and should be pretty quick - then the query using the Snowflake Execute only has to run once to return all data.
  
  Agreed, this makes a lot of sense but currently we don’t have our bulk snaps setup in the Org. In the interim, I believe Inner Join will do the job as there is a paradigm shift in the performance. Really appreciate your suggestions though.
darshthakkar
Valued Contributor
4 years ago
Pipeline execution completed after ~1.5hrs, screenshots below:

What would be the reason behind such a delay?

Regards,
Darsh

Forum Discussion

Too much time executing a pipeline

15 Replies

Recent Discussions

Way to lock down in Prod org to "Monitor" only access?

trace API and proxy calls

Pagination Logic Fails After Migrating from REST GET to HTTP Client Snap

Pipeline Execute Pool size

Concat values of a field based on value of another field