How to improve pipeline performance
Hi,
I am reading multiple files from a S3 location and each file has multiple JSON records. I have to look for an id value in each JSON record and based on the id value, I have to create an individual JSON file for a single record and write it to a specific directory in S3. My pipeline is running fine but its performance is no that good for eg: to process 474 documents which has total 78,526 records , it took 2:30hr to wrote 78,526 files in another S3 directory which I believe is not good.
I am attaching my pipelines, if any improvement can be done in the pipeline please let me know. I really appreciate your suggestion.
This is the flow of my pipelines:
pl_psychometric_analysis_S3_child.slp → pl_psychometric_analysis_S3_split_events.slp → pl_psychometric_analysis_S3_write_events.slp
Thanks
Aditya
pl_psychometric_analysis_S3_child.slp (16.4 KB)
pl_psychometric_analysis_S3_split_events.slp (9.4 KB)
pl_psychometric_analysis_S3_write_events.slp (7.8 KB)