How to wait child pipeline till parent pipelines are completed

How to wait child pipeline till parent pipelines are completed. Currently the child pipelines are getting executed before the parent execution done with all the documents.
Because of that it is having memory issue.

Is there any wait to wait the child pipeline till all the documents execution and then call the child pipeline.

First of all, don’t have preceding tasks send any output to following tasks, unless it is the end of their process. If you do, the following tasks(which would deal with the child records), will run. Of course, you COULD provide aggregates of keys and accept that data to join in the child. This allows feedback, aggregation, RI validation, and processes children only after the parents are processed. Of course there is no value to that other than to isolate processes, and allow distributing among multiple nodes. OH, and doing that ALSO limits memory consumption if you use multiple JVMs, or multiple nodes.

Let me try out that option…

Thanks.

You could also try writing the output to a file / db from the parent pipeline, that way you can force all processing is done prior to initiate the child.

I probably should have said that. I have one case where I am doing precisely that. But I am kicking off the tasks to handle parameters and order, and simplify scheduling, and it sounded like harriesh might be doing it almost like a subroutine.

My parent pipeline reads data from a SAS file around 2 GB of data and Post process the data and do inserts in corresponding tables . Since my child pipleline is executed continously , it shoots more than 10 GB of memory used. If we able to wait till the parent pipeline is read , this will not go more than 3 GB of memory .

and sometimes it hangs with out going to the peak memory . The hanging stopped when I added the sort snap at the end . But it also does not help me in memory . AS sorting itself takes huge memory .

Any idea can we write any script file to wait the parent pipeline .

Does aggregation takes the same high memory as sort snap.

I had a memory problem with sorts as well. A lot of competing products do aggregations in groups, and use sorting to facilitate that. It turns out that the snaplogic aggregation is no different, It has group by fields, and the sorted streams selection, that give that away. So the aggregation probably takes as much memory as a sorter, unless the input comes in in group order, and you tell it it is sorted input. Since this can be bad if wrong, many products check the order coming in so if it is NOT sorted, and you say it is, it will crash with an error saying the data is not sorted. If you say it is unsorted, it will do the sort first, regardless of whether it is actually sorted.

@Harriesh, you might look at using the Group By N snap with Group Size of 0 at the end of the parent pipeline to aggregate all the documents into one and then start the Child pipeline with a splitter to break it back down into individual documents. I don’t know how much that would help out memory-wise, but I suspect it might be better than a sort.

1 Like

@del Wow That worked. I am also checking with the bigger data for the memory cas. Mostly It will solve the issue I guess . Thanks a lot .

Will mark it has worked , once It does not impact the memory .

With the Group Size = 0 , It was better than sorting but still it was taking huge memory .
But varying the group size I was able to control my Memory used for execution .

It would have been nice if the group size was variable , currently it is constant value. It can not be changed for the execution.