Forum Discussion
That is a challenge as I cannot meddle with the source system which has huge amount of data (millions of rows) and with all the joins in place, we are reducing it to a couple of thousands.
Would you suggest me to fetch smaller sets of data and then merge it at the end while designing a pipeline?
It’s quite common when developing and testing software to use very small sample data sets, either dummy data (made up for testing purposes) or a subset of the real data. This also lets you verify the expected results more easily. You can’t do that if you’re testing with millions of rows. After you validate the correct functioning of your pipelines against the test data, you create a version that runs against the real data.
- darshthakkar4 years agoValued Contributor
Gotcha, thanks for your suggestions.
Usually, I use a head or tail snap to reduce the no.of records and test my functionality while developing but when I deal with multiple pipelines and the requirements get changed after a while, it takes me a while to re-think of the design and while I’m redesigning it, I get stuck when I go ahead with another round of unit testing after the updated requirements.