โ06-28-2021 01:49 PM
Hi All,
I am looking for directions on how to perform a join based on between clause. I would like to join these 2 sources based on Source1.Key between source2.Key Start and source2.Key End
Source 1 :
|Key|Name|
|0|ABC0|
|1|ABC1|
|2|ABC2|
|3|ABC3|
|4|ABC4|
|5|ABC5|
|6|ABC6|
|7|ABC7|
|8|ABC8|
|9|ABC9|
|10|ABC10|
|11|ABC11|
|12|ABC12|
|13|ABC13|
|14|ABC14|
|15|ABC15|
|16|ABC16|
Source 2 :
|Key Start|Key End|Grp_number|
|0|0|1|
|1|4|2|
|5|10|3|
|11|13|4|
|14|16|5|
Expected Output :
|Key|Name|Key Start|Key End|Grp_number|
|0|ABC0|0|0|1|
|1|ABC1|1|4|2|
|2|ABC2|1|4|2|
|3|ABC3|1|4|2|
|4|ABC4|1|4|2|
|5|ABC5|5|10|3|
|6|ABC6|5|10|3|
|7|ABC7|5|10|3|
|8|ABC8|5|10|3|
|9|ABC9|5|10|3|
|10|ABC10|5|10|3|
|11|ABC11|11|13|4|
|12|ABC12|11|13|4|
|13|ABC13|11|13|4|
|14|ABC14|14|16|5|
|15|ABC15|14|16|5|
|16|ABC16|14|16|5|
Solved! Go to Solution.
โ06-29-2021 12:29 PM
OK - got it. SnapLogic does not support the โbetweenโ operation in Join snaps. So to accomplish this in a streaming manner, I propose the attached. Basically, it generates the missing records on the bottom path before the join. This should keep memory overhead low since the only arrays kept in memory are the arrays generated to fill the gaps.
Join Between_2021_06_29.slp (10.1 KB)
Of course, this does produce many more records on that bottom path, but since you are using Inner join, the memory in the Join is still low as well.
โ06-28-2021 03:34 PM
Hey @Majid,
Iโve uploaded a pipeline with a solution to your problem. The second mapper is just to rearrange the fields, you donโt really need it if the order doesnโt matter.
Join Between_2021_06_29.slp (10.0 KB)
Hope youโll find this helpful,
Regards,
Bojan
โ06-29-2021 07:07 AM
Thank You @bojanvelevski for the great solution. I tried to use this in my use case , it is working fine when we have small set of records but when there are lot of records Join after gate seems to be taking lot of time with a thru put of 3.3 docs/second. In my use case I have 6 million records.
โ06-29-2021 08:44 AM
@Majid
The Gate snap is designed to wait for all upstream snaps to finish executing as well as collect all the input documents received into a single output document. I donโt think Gate would be the right choice here if there are 6 million records being collected into a single document, this may be your bottleneck.
โ06-29-2021 09:16 AM
Yes, I agree. You can try and optimize the process. Separate the incoming records into smaller chunks with the Group By N snap instead of Gate. And use Pipeline Execute, with reuse executions and increased pool size. But be careful on the resources, itโs pretty large amount of data.