12-29-2023 01:59 PM
I'm curious about a dynamic process to "paginate" from a SQL source. I was presented with the problem today and I'm curious if I can build something to do that.
So I have a source that can have 10 rows one day and 1M rows the next, however the query can only handle say ~10k rows at a time. So I would like a dynamic chunking/pagination utilizing limit/offset. Given X=10k, query "SELECT * FROM foo OFFSET 0 LIMIT X", if row count = X, "SELECT * FROM foo OFFSET X LIMIT X"... until row count < X. That makes sense right?
The first approach I can think of is to query the source for an aggregate and pass these results to an EXECUTE PIPELINE job that queries in series and then union/gate everything together at the end. I just don't like the aggregate against the source, any better ideas?
Thanks!
01-04-2024 05:30 AM
@rriley99 - please download the attached ZIP file, decompress it, and import the example pipeline. I'm using the sl.range built-in function to generate an array that contains the offsets to be used to get my total number of records based on the count returned in the first snap. Then I simply split the array and call the Select snap to read the paginated results from the source table.
There is at least one solution I can think of that doesn't require the initial record count; however, I recommend the approach provided since a record count from a database is typically not costly, especially with a 1M record table - if we were talking 100's of millions or billions of records, then a different approach is probably worth pursuing.
Hope this helps!