Forum Discussion

philliperamos's avatar
philliperamos
Contributor
6 years ago

Error when joining a large data stream to a small one

Good day all,

I’m getting an error when joining a large data stream (118M rows) with a smaller one (276 rows).
The join is on a field called ‘column_name’ to ‘col’, and its an Inner join.

The smaller data set contains data validation info (length, data type, etc) that I’m adding to each row of data, hence the join on the column name.
It works well with a smaller sample, but when I reach into the millions, the join snap fails.

See the error below:

Any assistance would be appreciate.

4 Replies

  • cstewart's avatar
    cstewart
    Former Employee

    As you are not using sorted data, you might find that it is trying to sort the large input data. In this scenario, you might find that using the in-Memory lookup rather than the join gives you a better result, it doesn’t have to sort the input streams.