cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How do I avoid multiple select requests

shane
New Contributor II

I have multiple data-stores and need to hydrate data pulled from one with data from another.
The issue is that if I have two independent staring snaps perform the queries, I would have to pull every item in the table (hundreds of thousands) just to join one column with the simple ID field from the first data-set.

It doesnโ€™t make sense to me to pull every item from a table when the first query provides the list of IDs that needs to be pulled. If I put a query snap in the middle of the pipeline then it creates a select statement for each data-point which is very bad as it results in 10K+ individual select statements that are just looking for IDs.

I cannot find anything anywhere on how to group queries to do this. I have tried using the aggregate snap and the group by snaps and cannot find a way to do this.

Is it the case that SnapLogic is built in a way that inhibits me from doing this?
It seems like a lot of overhead to do something like write to a file then read the file just to get an array of IDs to perform a query.

I see query logs in Redshift that seem to point to queries sometimes being โ€˜chunkedโ€™ for these inline select snaps but for the most part they donโ€™t, and I am unsure if this is just spotty visibility using the query viewer in Redshift.

Any help is appreciated, Thanks.

3 REPLIES 3

tstack
Former Employee

Ideally, youโ€™d be able to write a WHERE clause in the Select snap that referenced the ID array in the input document (e.g. the_id in $the_ids). But, I donโ€™t think thatโ€™s well supported by JDBC drivers. For now, youโ€™ll probably need to use an expression to expand the array into something like the_id in ($the_ids[0], $the_ids[1], ..., $the_ids[N]). Since thatโ€™s a bit complicated, Iโ€™ve created an expression library that has a function to do just that:

You can include that library in your pipeline and then use the following snippet to call the function in the โ€œWhere clauseโ€ property of the Select snap:

lib.sqlhelper.inArray('the_id', $, '$the_ids')

Hereโ€™s a screenshot of a Select snap with the property configured and an example of the generated SQL fragment:

image

Iโ€™ll also file a feature request to see if the snaps can be updated to more naturally support this use case.

tstack
Former Employee

Ah, I forgot thereโ€™s a Lookup snap that will batch multiple input documents into a single SELECT query with a WHERE clause that is dynamically generated. That might work better for you.

shane
New Contributor II

Thanks for the feedback!

I am receiving an error using your expression, it seems to not like a BigInteger type:
55%20PM

I think this may work with a little adjustment though. I tried converting each id into a string value first, but it seems that this error is thrown on the arr.length which isnโ€™t referencing the value but rather the array itself.

This is the output of the snap feeding my query snap: (if it helps)
15%20PM

I have not been able to find a way to turn this into an array to pass in.

As far as the Redshift Lookup snap, I will have to take the time to configure the connections as it looks like I will have to create a SnapLogic IAM role and everything. There is also the added cost of the Redshift snap pack just to do a simple select statement that I have to make sure my higher-ups are ok with.

Also, I have need to do this type of query into MSSQL and MySQL databases as well so the Redshift snaps wouldnโ€™t help much there. It would be nice to have a way to expose or pass in the document array itself to another snap, or to the expr file for more control.

Thanks again for all of the help.