cancel
Showing results for 
Search instead for 
Did you mean: 

Append input data vertically

kumar25
New Contributor II

Hi,

I have 3 input streams, like below sample

Input-1

ASERUFVH816783KGGVKLHO

KJGFYGCVJVJHJ7859444MBJHJHGFK

JBJGFR565675GGJFJFJGFKFFTDDK67

 

Input-2

KJGLFUDGCLHGIFGFKTDEDDXJGCCHG46533856

VHJFHTDGFXCGHCHKGDDDDHGCJVJGT6764468MVNVCHGFH

5444755687VHJJFFHGCVJGVJGF55764674

 

Input-3

HVLJFVBLGUFTFYFJVBKGFDDYDVHJBKJ44568459

BGJKFTYEDCJVHJGUFYYTESRFGCHVJHLHIGF455656

VHJFTRYTRFJHVJKFTYDDCGVJGLIYFYTEYD763653356

 

I have to join these vertically. When i am using Union Snap they are not getting added in a sequence, they are randomly added one after the other. And when i use Join Snap with Merge option, they are getting added horizontally.Can someone let me know how to append the input streams one below the other without changing the sequence.

4 REPLIES 4

AleksandarAngel
Contributor III

Hello @kumar25,

You can try by introducing the Gate Snap to gather all inputs, followed by concatenating them using the array concat function within a Mapper Snap. Finally, split them using the JSON Splitter Snap.

Below is a sample pipeline demonstrating this method.

Take note of the snaplex memory usage, as the Gate Snap waits for all documents upstream.

Please let me know if this helps you.

Regards,

Aleksandar.

koryknick
Employee
Employee

@kumar25 - Do you just need something like the following?

input-1 line1
input-2 line1
input-3 line1
input-1 line2
...etc?

I've attached an example pipeline how to accomplish this.  Basically, it's adding variables to track the input the data is coming from and the record number of each record.  Once the data is combined using Union, sort the data on the record number and input view.  Finally, remove the temporary values used for this process.  Note the use of the "Passthrough" option in the Mappers so I don't need to know what the record layout is.

Hope this helps!

PS - I appreciate @AleksandarAngel 's contribution; however, I do warn against using the Gate snap with the "All input documents" setting.  If your input dataset is large, it can consume considerable resources on your execution nodes, causing other pipelines to pause to wait on resources, or worst case can crash the node depending on other activity.  Gate is a powerful snap and can be used very effectively, just remember:

koryknick_0-1707745483163.jpeg

 

 

kumar25
New Contributor II

@koryknick , I need data to be appended as below.

 

input-1 line 1

Input-1 line 2

Input-1 line 3

Input-2 line 1

Input-2 line 2

Input-2 line 3

Input-3 line 1

Input-3 line 2

Input-3 line 3

Just change the order of the Sort Paths in the Sort by recnum and input Sort Snap from the @koryknick's pipeline.

AleksandarAngel_0-1707839206109.png