Forum Discussion
@ptaylor: What would you suggest to use instead of an Inner join then? I would need the output as there is a lot of transformations that I would need to do after the output of Inner join.
I believe, this might have been the case with Distinct results while saving/validating pipeline vs executing pipeline - #10 by ptaylor too but getting Null values was very surprising. Once I get a chance to test that, I will keep you posted on that thread itself. For now, let’s just stay focused on the workaround for Inner Join (it’s very easy for me to get distracted when the 2 phenomenon are somewhat similar)
- ptaylor4 years agoEmployee
You need to develop the pipeline using a smaller set of data so that each stage will have output.
- darshthakkar4 years agoValued Contributor
That is a challenge as I cannot meddle with the source system which has huge amount of data (millions of rows) and with all the joins in place, we are reducing it to a couple of thousands.
Would you suggest me to fetch smaller sets of data and then merge it at the end while designing a pipeline?
- ptaylor4 years agoEmployee
It’s quite common when developing and testing software to use very small sample data sets, either dummy data (made up for testing purposes) or a subset of the real data. This also lets you verify the expected results more easily. You can’t do that if you’re testing with millions of rows. After you validate the correct functioning of your pipelines against the test data, you create a version that runs against the real data.