cancel
Showing results for 
Search instead for 
Did you mean: 

Async issue with pool size option in Pipeline Execute Snap

walkerline117
Contributor

In my pipeline, I use Pipeline Execute snap to run some sub-pipeline.

I like using the pool size option in this Snap so that it can create multiple pipelines running at concurrently.

However, like any other concurrent thing, sometimes there is a small chance that 2 or more concurrent threads write the same data into the same place(e.g. database) at the same time, although there’s a check in my pipeline for the data existance in DB before doing insert. (If no data, then insert, if there is data, then update)

Is there any locking or thread safe mechanisams (like Java lock) in Snaplogic that can prevent such threading issue?

Thanks

1 ACCEPTED SOLUTION

Yes if the location is made sure to exist before this is written.

I am pretty sure you wont have to insert 100s of locations every load. It is a one time thing.

Glad you figured it!!

View solution in original post

7 REPLIES 7

nganapathiraju
Former Employee

Is that your assumption of this?

What happens is that the incoming documents are sent in round robin to this
snap even if you have the pool size enabled. So they will never be sent to
two instances of execution.

Unless you have duplicates in your data you will be safe with the pool and
achieve true concurrency.

Thats what im trying to say, we have the same/duplicates in my data…

in my case, its the firm employee data where the location of the employee is one of the column in the row(document)

e.g.
1, Emplyee1, New York
2. Emloyee2, New York

So the sub-pipeline is to check if the location of that Employee is in the DB first, if its in DB, it will update the location, if its not in DB, it will insert the location into DB.

In above case, because the location in the above 2 document are same. Lets say the database does not have New York initially. I think with concurrent or even round robin…if for the first snap execution there’s a delay to insert New York into DB that the insert is NOT able to happen before the 2nd pipeline execution doing database check, the 2nd execution would insert the same into DB.

Thanks

If its the same data and you’re doing an update, it seems like it’d be a non issue.

I’d like to ask why understand your use case more, You have a child pipeline running to do individual updates to a database?That’d eat a lot of resources spinning up and down those parallel executions, even if you reuse. From what you’ve described here it sounds like you’re injecting parallelism just for the sake of it.

It’d most likely be more efficient and performant to use a single db snap to do your updates and edit the batch settings on your db account.

Also you should not think of pipelines as threads, but a process that’s a collection of threads. In reality, each snap in a pipeline has its own thread. Those snaps combine to perform a semi-serial execution, and pipeline executions are generally not aware of each.

Best,

unfortunately to insert the data in the db, its not via SQL, we have to call SOAP API to insert/update single location into that systems’DB.