I have multiple data-stores and need to hydrate data pulled from one with data from another.
The issue is that if I have two independent staring snaps perform the queries, I would have to pull every item in the table (hundreds of thousands) just to join one column with the simple ID field from the first data-set.
It doesn’t make sense to me to pull every item from a table when the first query provides the list of IDs that needs to be pulled. If I put a query snap in the middle of the pipeline then it creates a select statement for each data-point which is very bad as it results in 10K+ individual select statements that are just looking for IDs.
I cannot find anything anywhere on how to group queries to do this. I have tried using the aggregate snap and the group by snaps and cannot find a way to do this.
Is it the case that SnapLogic is built in a way that inhibits me from doing this?
It seems like a lot of overhead to do something like write to a file then read the file just to get an array of IDs to perform a query.
I see query logs in Redshift that seem to point to queries sometimes being ‘chunked’ for these inline select snaps but for the most part they don’t, and I am unsure if this is just spotty visibility using the query viewer in Redshift.
Any help is appreciated, Thanks.