02-11-2024 09:22 PM
Hi,
I have 3 input streams, like below sample
Input-1
ASERUFVH816783KGGVKLHO
KJGFYGCVJVJHJ7859444MBJHJHGFK
JBJGFR565675GGJFJFJGFKFFTDDK67
Input-2
KJGLFUDGCLHGIFGFKTDEDDXJGCCHG46533856
VHJFHTDGFXCGHCHKGDDDDHGCJVJGT6764468MVNVCHGFH
5444755687VHJJFFHGCVJGVJGF55764674
Input-3
HVLJFVBLGUFTFYFJVBKGFDDYDVHJBKJ44568459
BGJKFTYEDCJVHJGUFYYTESRFGCHVJHLHIGF455656
VHJFTRYTRFJHVJKFTYDDCGVJGLIYFYTEYD763653356
I have to join these vertically. When i am using Union Snap they are not getting added in a sequence, they are randomly added one after the other. And when i use Join Snap with Merge option, they are getting added horizontally.Can someone let me know how to append the input streams one below the other without changing the sequence.
02-12-2024 01:03 AM - edited 02-12-2024 01:03 AM
Hello @kumar25,
You can try by introducing the Gate Snap to gather all inputs, followed by concatenating them using the array concat function within a Mapper Snap. Finally, split them using the JSON Splitter Snap.
Below is a sample pipeline demonstrating this method.
Take note of the snaplex memory usage, as the Gate Snap waits for all documents upstream.
Please let me know if this helps you.
Regards,
Aleksandar.
02-12-2024 05:45 AM
@kumar25 - Do you just need something like the following?
input-1 line1
input-2 line1
input-3 line1
input-1 line2
...etc?
I've attached an example pipeline how to accomplish this. Basically, it's adding variables to track the input the data is coming from and the record number of each record. Once the data is combined using Union, sort the data on the record number and input view. Finally, remove the temporary values used for this process. Note the use of the "Passthrough" option in the Mappers so I don't need to know what the record layout is.
Hope this helps!
PS - I appreciate @Aleksandar_A 's contribution; however, I do warn against using the Gate snap with the "All input documents" setting. If your input dataset is large, it can consume considerable resources on your execution nodes, causing other pipelines to pause to wait on resources, or worst case can crash the node depending on other activity. Gate is a powerful snap and can be used very effectively, just remember:
02-13-2024 06:28 AM
@koryknick , I need data to be appended as below.
input-1 line 1
Input-1 line 2
Input-1 line 3
Input-2 line 1
Input-2 line 2
Input-2 line 3
Input-3 line 1
Input-3 line 2
Input-3 line 3
02-13-2024 07:47 AM
Just change the order of the Sort Paths in the Sort by recnum and input Sort Snap from the @koryknick's pipeline.