01-02-2018 06:32 AM
I have a very big file (around 2gb) . I have to fetch that from FTP and have to split the file into chunks (let’s say 2 million records ) and create new files in the FTP . Please suggest any effective method of processing the data. I tried using router and sequence. Is there any better way to do it .
01-02-2018 01:30 PM
Have you tried group by N snap to split it based on count?
01-03-2018 01:43 AM
Hi @aleung ,
We have implemented this with the sequence generator and Router. It took 1 hour to process the whole file and creates new files with 2.5 million each which is pretty good .Still I am curious to know whether we can tune this to less time .could you help on this .
01-03-2018 10:46 AM
This type of job is best at the endpoints instead of snap. Python is really good at this kind of job, so if you can do the splitting either prior or post transfer is best. I know this isn’t ideal, but can’t think of a better way.