cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

SQL "pagination"

rriley99
New Contributor III

I'm curious about a dynamic process to "paginate" from a SQL source. I was presented with the problem today and I'm curious if I can build something to do that.

So I have a source that can have 10 rows one day and 1M rows the next, however the query can only handle say ~10k rows at a time. So I would like a dynamic chunking/pagination utilizing limit/offset. Given X=10k, query "SELECT * FROM foo OFFSET 0 LIMIT X", if row count = X, "SELECT * FROM foo OFFSET X LIMIT X"... until row count < X. That makes sense right?

The first approach I can think of is to query the source for an aggregate and pass these results to an EXECUTE PIPELINE job that queries in series and then union/gate everything together at the end. I just don't like the aggregate against the source, any better ideas?

 

Thanks!

1 REPLY 1

koryknick
Employee
Employee

@rriley99 - please download the attached ZIP file, decompress it, and import the example pipeline.  I'm using the sl.range built-in function to generate an array that contains the offsets to be used to get my total number of records based on the count returned in the first snap.  Then I simply split the array and call the Select snap to read the paginated results from the source table.

There is at least one solution I can think of that doesn't require the initial record count; however, I recommend the approach provided since a record count from a database is typically not costly, especially with a 1M record table - if we were talking 100's of millions or billions of records, then a different approach is probably worth pursuing.

Hope this helps!