cancel
Showing results for 
Search instead for 
Did you mean: 

Lost contact with Snaplex node

msalter
New Contributor II

I have a very small pipeline (3 snaps) that I’m reading from a SQL table and writing the results to a new table in the same DB. I keep getting an error: Lost contact with Snaplex node while the pipeline was running. The select statement is pulling 250+M records and I’m using the Azure bulk insert to write. In order to avoid this error I keep having to reduce the batch size, from 20K to 10K to now 8K. Any thoughts on what could be causing the error?

1 ACCEPTED SOLUTION

koryknick
Employee
Employee

@darshthakkar - I believe @dmiller is correct that the Shard Offsets is a custom snap. I’ve replicated the logic with core snaps and built in expressions. See the attached example.
Shard Example_2023_07_05.slp (5.6 KB)

The Mapper is where the magic happens.

sl.range(0, $TOTAL, Math.ceil ($TOTAL / parseInt(_shard_count)))
.map((val,idx,arr)=> 
  idx == arr.length - 1 
  ? { offset : val, limit : $TOTAL - val } 
  : { offset : val, limit : arr[idx + 1] - val } 
)

First, use the sl.range() built-in function to generate an array with the offsets to be used, then use Array.map() method to recreate the simple array of integers as an array of objects to provide both the offset and limit for each shard.

After the Mapper, just use a JSON Splitter to get a new document for each limit and offset combination (same as Shard Offsets snap) that would feed your child pipeline with the Pipeline Execute.

Hope this helps!

View solution in original post

13 REPLIES 13

ljupcho_machko1
New Contributor III

Hi @msalter ,

I assume that root cause is the memory usage. In my case I was receiving below error message:
image

It was resolved by using Shard Offsets snap, and transferring the data in Threads, using Pipeline execute snap.

msalter
New Contributor II

Thanks @ljupcho_machkovski . I thought that may be the case too but CPU never peaks above 60% and memory never peaks above 35%.

darshthakkar
Valued Contributor

@ljupcho_machkovski - Can you please share more details on how you resolved this?
Thank you.

@darshthakkar It was resolved by splitting the source payload in chunks - instead of processing all the records at once.