Too much time executing a pipeline

darshthakkar
Valued Contributor
4 years ago

koryknick:

realize that the Execute snap would be executed once for each input document - so in your screenshots, the statement is being executed 5,850 times.

That’s quite a lot, the best development practice would be to disable the input view for the snowflake execute and use an Inner Join instead; if no output preview is getting generated then plug in the inner join later on once the dev has finished.
- ptaylor
  Employee
  4 years ago
  If you just want to execute the Snowflake Execute once and it depends on the previous snaps finishing but doesn’t actually use the input documents, consider using the Gate snap and note its options.
  - darshthakkar
    Valued Contributor
    4 years ago
    Noted, thank you both of you @ptaylor @koryknick for your comments.
    Yes, I do need the input documents prior to snowflake execute snap and not just waiting for it to finish. That’s where I was specifying to use an “Inner Join”.
    
    Lemme throw some more light what I was trying to achieve. I have a flat file with ~6k records which I’m interested to deal with. Those ~6k records are IDs itself whose data I need from snowflake DB which would be accomplished putting multiple joins on different tables in snowflake. So, I wrote a query on snowflake execute with multiple joins and as the records were ~6k, I couldn’t put a where clause with those 6k records inside my query in the snowflake execute. This is where I enabled the input view of snowflake execute and added a clause like AND a1.NAME=$rawKnowledgeArticles which would only consider those ~6k records. You guessed it right, Name in snowflake = rawKnowledgeArticles from the flat file.
    
    I could have used an Inner Join (which I’m currently using) but Inner join doesn’t display an output preview due to the nature of the data I’ve currently, it doesn’t generate any preview even if I change my settings to 2000 documents. Thus, I went ahead with the development starting off with snowflake execute (thank you @ptaylor for your suggestion on another thread I opened with Joins) and it was pretty fast, 40s max and the entire pipeline would be executed. While the dev was finished, I plugged in the flat file output to snowflake input and that did run to be honest but took ~1.5hrs.
    
    The workaround was always using an Inner Join after the dev was completed as output preview wouldn’t be required for now and using Inner Join reduced the time to 13m instead of ~1.5hrs. 13m itself is long but I’m having a pipeline execute that generates ~6k files in different formats thus it justifies the 13m and I’m fine with it. For testing, I used head/tail with 50records and that took like 15s to run.

Forum Discussion

Recent Discussions

Automating Untracked assets to GIT

Pagination and nextCursor in header

How to get filename from file reader

Generate expression file from database query

Can we generate XML file in pretty print format using native snapLogic snaps?