Problem with copy and join

Hello,

I am working on this pipeline

The problem is with this part
small

I copy my result set.
I keep my first output
And aggregate my second output
And finally I want to take those aggregate results using a lookup.

This make my pipeline run endlessly. And if I remove the lookup (or join) and write to two different files it takes less than 2 minutes.
I think may be its because the two outputs are the image of the same result set.

Could you please tell me if you have seen this issue before and how to treat it.

Thank you

Hi @Wassim,

Try first to sort the data before you send it to the join.

How many records did you get before coping the records ?
If it’s only one, then instead of using Join snap try it with Gate snap. You will get same result as it is with Join. If there are more records and you are joining by some conditions then you can not use Gate.

Regards.

Hi Viktor,

Thank you for the answer.
I have 6k rows so i cant use the gate.
I sorted on the same field i group with and it doesnt help.

Did you mean Join Snap when you referred to Lookup snap here ? Also what is the output from the Aggregate Snap and Mapper after that ?

Hi skatpally thank you very much for the answer. its a lookup snap

Here is the aggregate

and here is the mapper after the aggregate

thank you

@Wassim

Below are some points that you should take about In-Memory Lookup Snap:

  • The join operation within the snap will start, when the right input document stream ends. Meaning that, in your case the snap first will wait the aggregation of the data to be completed and then processed by In-Memory Lookup.
  • All the right input data is loaded in memory(of the JVM) as a lookup data. So, it is possible for the Snap to cause a poor performances.

Did you have another processes that are running in parallel with this process, that are also using similar snaps(join, snaps for aggregation, group snaps etc.) inside, which have an impact of the memory?

Did you tried the same scenario using JOIN Snap?

Regards,
Spiro Taleski

@Spiro_Taleski
Thank you for the answer.
i did try join also.
i am aware of all that. i have 6000 rows. to aggregate to join to my first result and its not even moving
its like this

if i duplicate my snaps and make the aggregate and the join to the intial snaps it will take less than a minute.

Regards

here is a simple example
test copy and join_2021_09_08.slp (14.2 KB)

test - 2021-09-03T152531.018.xlsx (740.4 KB)

Hi Wassim,

This is actually a known issue (SWAT-3096) that we’re working on a fix for. It happens when there are at least 1024 records being copied by the Copy snap, for reasons that are a bit difficult to explain.

Until we have a fix, there are at least three workarounds:

  • Swap the order of the inputs to the Lookup snap, so that the output of the Aggregate is the first input rather than the second.
  • Insert a Sort snap right after each output of the Copy snap. It won’t work if you put the Sort before the Copy. In this workaround, the point of the Sort snaps isn’t to sort the data, which might already be sorted – it’s to essentially create independent buffers of the data from each of the Copy snap’s output views.
  • Replace the Lookup with a Join, and set the Sorted streams property to Unsorted.
1 Like

thank you very much @ptaylor