Adding more lookup decreasing the performance (Redshift lookup / SQL server Lookup)

Hi Team,

     Need your help /solution to solve the problem. 

When we trying to add more lookup’s in the pipeline performance is drastically decreasing . To load some 30K records with 3 lookup snap, it took around 30 minutes. Instead of lookup, if i use the join instead of lookup pipeline has completed within 3 minutes.

source --> Redshift Lkp1 (Good performance) --> Redshift Lkp2 (Slow) --> Redshift Lkp3 (Very slow) --> Target

Instead of Redshift , I tried with SQL server lookup and getting the same issue… Also, interchanged the lookup’s and tried it… That’s also not working…

Do you have any idea, what could be the reason for this performance issue?.


Well, query lookup, especially on databases, can be very high. Is there an option to cache it? Some ETL products run a query to get all candidates, and they use a binary tree to search it. It is not only a bit faster because it gets rid of so much network traffic and latency, but it also is faster because you have ONE moderate operation instead of maybe thousands or more ones that are smaller but still very expensive.

The join, works essentially the same way the binary tree method would, but requires changing your logic a little, etc…