cancel
Showing results for 
Search instead for 
Did you mean: 

Output preview not available for Inner Join

darshthakkar
Valued Contributor

Hi Team,

I’m using an Inner Join and observing that the output preview doesn’t appear. Is this an expected behaviour?
I however, have access to the data I’m expecting out of Inner join (as I’m writing to an excel file) but using a mapper after Inner Join doesn’t help as there is no data in the input schema of the mapper. How do I filter the data that I don’t need out of the Inner Join when the preview is not available and Input schema is showing nothing?

I also have access to the other filtered data as I have enabled the error view in Inner Join and that does have an output preview to it.

Appreciate your help and time on this.

Best Regards,
Darsh

26 REPLIES 26

darshthakkar
Valued Contributor

@ptaylor: What would you suggest to use instead of an Inner join then? I would need the output as there is a lot of transformations that I would need to do after the output of Inner join.

I believe, this might have been the case with Distinct results while saving/validating pipeline vs executing pipeline - #10 by ptaylor too but getting Null values was very surprising. Once I get a chance to test that, I will keep you posted on that thread itself. For now, let’s just stay focused on the workaround for Inner Join (it’s very easy for me to get distracted when the 2 phenomenon are somewhat similar)

You need to develop the pipeline using a smaller set of data so that each stage will have output.

That is a challenge as I cannot meddle with the source system which has huge amount of data (millions of rows) and with all the joins in place, we are reducing it to a couple of thousands.

Would you suggest me to fetch smaller sets of data and then merge it at the end while designing a pipeline?

It’s quite common when developing and testing software to use very small sample data sets, either dummy data (made up for testing purposes) or a subset of the real data. This also lets you verify the expected results more easily. You can’t do that if you’re testing with millions of rows. After you validate the correct functioning of your pipelines against the test data, you create a version that runs against the real data.

Gotcha, thanks for your suggestions.
Usually, I use a head or tail snap to reduce the no.of records and test my functionality while developing but when I deal with multiple pipelines and the requirements get changed after a while, it takes me a while to re-think of the design and while I’m redesigning it, I get stuck when I go ahead with another round of unit testing after the updated requirements.