Error when joining a large data stream to a small one

Good day all,

I’m getting an error when joining a large data stream (118M rows) with a smaller one (276 rows).
The join is on a field called ‘column_name’ to ‘col’, and its an Inner join.


The smaller data set contains data validation info (length, data type, etc) that I’m adding to each row of data, hence the join on the column name.
It works well with a smaller sample, but when I reach into the millions, the join snap fails.

See the error below:

Any assistance would be appreciate.

As you are not using sorted data, you might find that it is trying to sort the large input data. In this scenario, you might find that using the in-Memory lookup rather than the join gives you a better result, it doesn’t have to sort the input streams.

Thanks @cstewart , I’ll give it try after hours and see how it performs.

So I tried using the In-Memory Lookup, and I still got the same error.