So it does the whole sort in memory? Yeah, that process is going to consume a lot of memory if it does the whole sort in memory. If it were 100% efficient, it would take about 4GB, but sorts aren’t 100% efficient, so it could be more like 6GB, or even more, just for the sort. That is probably about 9% of the space on most ETL systems I have used. I figured it would offload stuff to temporary files or something similar.
The ZIP files it is reading from are another 0.5GB.
Ironically, I did that to save time, on later processing for tests, but the customer wanted me to end up with full inserts, and that requires reducing one file to almost nothing, so the size is cut in half, so I guess this means THAT might work. But that is just for now.
Is there a way to do a sort on the drive, rather than memory? A lot of systems today sort data in chunks, write that to files, and then do an ordered merge. It is still pretty fast, and ends up with ordered data, but can work on systems having only a small amount of memory. That used to be a BIG concern. It isn’t so big now, but on a shared system, or one managed by another, it could be a nuisance. Otherwise, maybe I will have to check if we can add a database into the mix.