Task Execute Timing out

chthroop
New Contributor III
6 years ago
Yes, a key gap in the Pipeline execute is the simple Batch function that the Execute Task has. I figured out how to simulate that functionality by adding a Group by N, and a splitter to the sub-pipeline. This should be basic functionality for Pipeline Execute imho.

Anyway, while this solves the time out issue, i discovered the next issue which is once you start a pipeline execution, even if your snapplex has multiple nodes, all executions stay in the same node. So force Snaplogic to do workload management, I created a simple way to split the work in 1/2, and created 2 separate tasks that use a pipeline parameter to call different groups of data.

The problem I have now is that even when I start the 2 tasks a couple of minutes apart, that they are both going to the exact same Node. I have tested multiple times and occasionally I do get work to be put on the two nodes, but it isn’t consistent. I need the platform to consistetly realize that a node is at 80+% utilization and use the node that is at 5%…

Any ideas?
- tstack
  Former Employee
  6 years ago
  
  chthroop:
  
  Yes, a key gap in the Pipeline execute is the simple Batch function that the Execute Task has. I figured out how to simulate that functionality by adding a Group by N, and a splitter to the sub-pipeline.
  
  Can you help us understand what you are trying to do overall? You’ve mentioned needing to use batches, can you give some more detail on why that is? It’s difficult to help without a better understanding of what you are trying to achieve.
  
  chthroop:
  
  i discovered the next issue which is once you start a pipeline execution, even if your snapplex has multiple nodes, all executions stay in the same node.
  
  Did you set the Snaplex property in the PipeExec snap? If the Snaplex property is left blank, it will only execute child pipelines on the local node.
  
  chthroop:
  
  The problem I have now is that even when I start the 2 tasks a couple of minutes apart, that they are both going to the exact same Node.
  
  The second node might have some issue, open a support case so we can take a closer look and find an explanation.
  - chthroop
    New Contributor III
    6 years ago
    Hi,
    We don’t set the snapplex properly as we expect snaplogic to workload balance the nodes within the production snapplex. In our testing, it appears that Snaplogic will eventually chose a different node when a node is busy, but it takes some time for snaplogic to realize that a node is busy.
    
    A key use case of the batches functionality is to take a smaller number of records to a sub-pipeline. For example, in our case, we are pulling massive amounts of weather data for a list of 5-7k locations. If I send all 7k locations at once, the pull overwhelms node as the data pull side is about 260mm records. therefore, we use a pipleline/sub pipeline structure to pull 1000 locations at a time. However, this data must all be pulled in within an hour, so I need to have multiple pulls going on at the same time. If I do it all in a serial pipeline it takes 3 hours. However, by splitting the locations into 2 groups, I can run 2 nodes at once and get done in 1.5 hours…
    
    The snapplex property doesn’t help becuase you can only specific a snapplex. We need better load balancing on our nodes so that it looks quickly at both nodes and decides which is the least used. Support is taking a look at this and they pointed me to documentation which shows that load balancing is done by thread count. Which is fine, but clearly from our testing it only is looking in like 5+ minute resolution, instead of real time when the job is being prepared

Forum Discussion

Recent Discussions

Way to lock down in Prod org to "Monitor" only access?

trace API and proxy calls

Pagination Logic Fails After Migrating from REST GET to HTTP Client Snap

Pipeline Execute Pool size

Concat values of a field based on value of another field