Forum Discussion

msalter's avatar
msalter
New Contributor II
4 years ago
Solved

Lost contact with Snaplex node

I have a very small pipeline (3 snaps) that I’m reading from a SQL table and writing the results to a new table in the same DB. I keep getting an error: Lost contact with Snaplex node while the pipeline was running. The select statement is pulling 250+M records and I’m using the Azure bulk insert to write. In order to avoid this error I keep having to reduce the batch size, from 20K to 10K to now 8K. Any thoughts on what could be causing the error?

  • @darshthakkar - I believe @dmiller is correct that the Shard Offsets is a custom snap. I’ve replicated the logic with core snaps and built in expressions. See the attached example.
    Shard Example_2023_07_05.slp (5.6 KB)

    The Mapper is where the magic happens.

    sl.range(0, $TOTAL, Math.ceil ($TOTAL / parseInt(_shard_count)))
    .map((val,idx,arr)=> 
      idx == arr.length - 1 
      ? { offset : val, limit : $TOTAL - val } 
      : { offset : val, limit : arr[idx + 1] - val } 
    )
    

    First, use the sl.range() built-in function to generate an array with the offsets to be used, then use Array.map() method to recreate the simple array of integers as an array of objects to provide both the offset and limit for each shard.

    After the Mapper, just use a JSON Splitter to get a new document for each limit and offset combination (same as Shard Offsets snap) that would feed your child pipeline with the Pipeline Execute.

    Hope this helps!

18 Replies

  • @darshthakkar - I believe @dmiller is correct that the Shard Offsets is a custom snap. I’ve replicated the logic with core snaps and built in expressions. See the attached example.
    Shard Example_2023_07_05.slp (5.6 KB)

    The Mapper is where the magic happens.

    sl.range(0, $TOTAL, Math.ceil ($TOTAL / parseInt(_shard_count)))
    .map((val,idx,arr)=> 
      idx == arr.length - 1 
      ? { offset : val, limit : $TOTAL - val } 
      : { offset : val, limit : arr[idx + 1] - val } 
    )
    

    First, use the sl.range() built-in function to generate an array with the offsets to be used, then use Array.map() method to recreate the simple array of integers as an array of objects to provide both the offset and limit for each shard.

    After the Mapper, just use a JSON Splitter to get a new document for each limit and offset combination (same as Shard Offsets snap) that would feed your child pipeline with the Pipeline Execute.

    Hope this helps!

    • ashis1upreti's avatar
      ashis1upreti
      New Contributor II

      Hello koryknick, can you please help in my case. I am facing the same issue, due to large vol of data and I did not quite get how to do. Where do I add these mappers. I am attaching some screenshots herewith. So basically, my child pipeline reads from s3uri passed from parent pipeline and when it splits that file content, it will have 2M+ records that will insert the data into the respective tables

       

       

      it. Attaching my child and parent pipelines, where do I modify it exactly

      • koryknick's avatar
        koryknick
        Employee

        ashis1upreti - The error you are getting has nothing to do with transaction volume.  You are using an incorrect account for the snap type.  When creating or selecting accounts, I recommend you do it in Designer by using the Account dropdown list in the snap you want to use it in.  Keep in mind that accounts are specific to the snap pack associated with the snap being used.  So you cannot use an "AWS S3" account (which exists in the AWS S3 snap pack) in the File Reader or S3 File Reader (which are both  in the Binary snap pack).

        In the future, please open a new thread rather than re-opening a new thread that has been resolved.  This just helps keep the focus of the topic at hand.  You can always refer to another thread by including a link to the previous thread if you believe it is related.

  • ljupcho_machko1's avatar
    ljupcho_machko1
    New Contributor III

    Hi @msalter ,

    I assume that root cause is the memory usage. In my case I was receiving below error message:

    It was resolved by using Shard Offsets snap, and transferring the data in Threads, using Pipeline execute snap.

      • darshthakkar's avatar
        darshthakkar
        Valued Contributor

        Did you use a pipeline execute with batch/pool size defined?

  • darshthakkar's avatar
    darshthakkar
    Valued Contributor

    I did check the documentation and there is nothing specified about the “Shard” snaps or anything about sharding in general.

    • dmiller's avatar
      dmiller
      Former Employee

      I believe it is a custom Snap. Trying to track it down now. 🙂