ContributionsMost RecentMost LikesSolutionsRe: Union Snap doesn't appear to be joining streams properly So, I did that, and its still only showing the 2 records from the first load… It should show 20 in this. I also tried a gate only, instead of a union+gate, and each source became like “input 0” “input 1” etc… and it was unclear how to turn into a simple list of records… Re: Union Snap doesn't appear to be joining streams properly Hi, I am not trying to join disparate data sets to create super set of fields. I have 10 excel files are the exact same format that I simply want to “stack” into one load… Union Snap doesn't appear to be joining streams properly Hi, I am trying to join documents from 10 exactly the same XLS files. The Union Snap should bring them all into one stream, but only the top file is coming through. Doesn’t work in Validation or in Execution. Re: Microsoft Office 365 Snaps? this would be fantastic… and Sharepoint 365 Re: Task Execute Timing out Hi Chris, I agree. Tim - our testing shows that the PipelineExec loads all children to one node. To Chris’ point, I would think that another setting should be “multi-node” where you can specific whether you want the child pipelines spread across multiple nodes or not. I would imagine that multi-node would require a bit more communications between nodes, but would create better utilization Once we get this project live, I will circle around and re-do the testing so I can send you some screen shots / runtime data on the crashes. I did send the screenshot to support, which shows how even though we have 2 nodes, one will zero pipelines, all the pipelines are being allocated to the same node over a 2 minute period Re: Task Execute Timing out Hi, Thanks for the insights. Yes, we tried the PipeExec with pool. The problem is that the pool redlines the node, and doesn’t distribute dynamically over the 2 nodes. We have basically figured out how to “hard force” resource loading. We group the location data into 3 groups. We setup a pipeline parameter and we setup 3 tasks, which invoke the master pipeline with the parameter. We learned that Snaplogic takes a while to update the Open Threads count which is used for resource loading so by starting our 3 tasks 5 minutes apart (we were able to get succesfull resource loading with 3 minute, but chose 5 to be safe), we observe the system properly utilizing the nodes. When we did 2 minutes or less, the 3 tasks got assigned all to the same node. In terms of “overwhelms”, there were severall different tests: All 7k locations at once. System chugged along with mem/cpu at 95%, which caused other jobs to fail. Then after about 250mm records loaded, we got an out of local storage crash When used pipeline exec, all the pipelines spawned and again pegged mem/cpu at 95% (atleast what we saw on the monitors). Also we noticed degraded performance (seems slow down) with very large pull requests 1-2 hours in Our new approach is working. I hope that Snaplogic can update how fast the Open Threads variable is updated, this will improve resource load balacing for Cloudplex users! Thanks for the time Tim, much appreciate your looking at this Re: Task Execute Timing out Hi, We don’t set the snapplex properly as we expect snaplogic to workload balance the nodes within the production snapplex. In our testing, it appears that Snaplogic will eventually chose a different node when a node is busy, but it takes some time for snaplogic to realize that a node is busy. A key use case of the batches functionality is to take a smaller number of records to a sub-pipeline. For example, in our case, we are pulling massive amounts of weather data for a list of 5-7k locations. If I send all 7k locations at once, the pull overwhelms node as the data pull side is about 260mm records. therefore, we use a pipleline/sub pipeline structure to pull 1000 locations at a time. However, this data must all be pulled in within an hour, so I need to have multiple pulls going on at the same time. If I do it all in a serial pipeline it takes 3 hours. However, by splitting the locations into 2 groups, I can run 2 nodes at once and get done in 1.5 hours… The snapplex property doesn’t help becuase you can only specific a snapplex. We need better load balancing on our nodes so that it looks quickly at both nodes and decides which is the least used. Support is taking a look at this and they pointed me to documentation which shows that load balancing is done by thread count. Which is fine, but clearly from our testing it only is looking in like 5+ minute resolution, instead of real time when the job is being prepared Re: Task Execute Timing out Yes, a key gap in the Pipeline execute is the simple Batch function that the Execute Task has. I figured out how to simulate that functionality by adding a Group by N, and a splitter to the sub-pipeline. This should be basic functionality for Pipeline Execute imho. Anyway, while this solves the time out issue, i discovered the next issue which is once you start a pipeline execution, even if your snapplex has multiple nodes, all executions stay in the same node. So force Snaplogic to do workload management, I created a simple way to split the work in 1/2, and created 2 separate tasks that use a pipeline parameter to call different groups of data. The problem I have now is that even when I start the 2 tasks a couple of minutes apart, that they are both going to the exact same Node. I have tested multiple times and occasionally I do get work to be put on the two nodes, but it isn’t consistent. I need the platform to consistetly realize that a node is at 80+% utilization and use the node that is at 5%… Any ideas? Task Execute Timing out Hi, We are using the cloudplex version of Snaplogic. We have a pipeline built that downloads a very large weather data set. We had to build the pipeline to execute in “batches” using the Task Execute snap because the 100mm row+ download files up the temp file space if we do all in one go… I am getting timeouts after 15 minutes and would like to either up that timeout parameter or make it so that the parent pipeline doesn’t think the task pipeline has timed out. I was told that having an “open view” in the task pipeline would keep the parent from thinking a timeout has happened but this isn’t working. Any ideas? thanks Re: Convert Rows to Columns I had the same need before myself. The Pivot is perfect for transforming spreadsheets into lists. I had found this answer before, but had to dig. The example in the Catalog last I checked was a complex approach that wasn’t nearly as easy to use Snap should put their pivot example into the catalog… We use it every day