cancel
Showing results for 
Search instead for 
Did you mean: 

Problem with copy and join

Wassim
New Contributor

Hello,

I am working on this pipeline
big

The problem is with this part
small

I copy my result set.
I keep my first output
And aggregate my second output
And finally I want to take those aggregate results using a lookup.

This make my pipeline run endlessly. And if I remove the lookup (or join) and write to two different files it takes less than 2 minutes.
I think may be its because the two outputs are the image of the same result set.

Could you please tell me if you have seen this issue before and how to treat it.

Thank you

1 ACCEPTED SOLUTION

ptaylor
Employee
Employee

Hi Wassim,

This is actually a known issue (SWAT-3096) that we’re working on a fix for. It happens when there are at least 1024 records being copied by the Copy snap, for reasons that are a bit difficult to explain.

Until we have a fix, there are at least three workarounds:

  • Swap the order of the inputs to the Lookup snap, so that the output of the Aggregate is the first input rather than the second.
  • Insert a Sort snap right after each output of the Copy snap. It won’t work if you put the Sort before the Copy. In this workaround, the point of the Sort snaps isn’t to sort the data, which might already be sorted – it’s to essentially create independent buffers of the data from each of the Copy snap’s output views.
  • Replace the Lookup with a Join, and set the Sorted streams property to Unsorted.

View solution in original post

11 REPLIES 11

vgautam64
New Contributor III

I too faced this exact problem recently and this thread proved really helpful. Thanks a lot!

Considering this post is more than 1.5 years old now, what is the status on the fix for this issue?

dd_snaplogic
New Contributor II

Hey I’m also facing the same issue, any updates on the fix?