Forum Discussion

New Contributor

5 years ago

Parsing XML Data and formatting it

Hi Team, We have a requirement to parse the xml file (1.5GB) and transform/group the content based on the one of the field value and write write multiple files based on the each group. === Sample...

xml

RogerSramkoski

Employee

5 years ago

@viktor_n The maximum file size of 100MB is only for files uploaded to SLDB (Files tab in Manager when looking at a project or project space). If you’re using a File Reader, S3 File Reader, or something similar you can read larger files.

@acmohan023 Can you share screenshots of the pipeline and pipeline statistics? An example of the files or a clearer example of what you’re trying to map would also be helpful. If I understand you’re initial description correctly you may be doing a Group By N or Group By Field operations and copying to multiple targets, both of which will impact memory on the node.

acmohan023

New Contributor

5 years ago

@rsramkoski - Have attached the sample input structure and output structure for your reference. (I had pasted request and response structure and it is not visible).
InputAndOutpultSample.txt (1.3 KB)

As file Size is huge , tried to split the flow into multiple pipelines (to process it in chunk and release memory) :

Read the file from SFTP and XML Parse to split the data . Group by N and call 2nd pipeline
split group , by sort by field ( input field in attached file) call 3rd pipeline
write files smaller chunk to local sftp
On completion of all above 3 pipeline , read small chunk files based on file name and write consolidated file per group by field (input field in attached file)

Please suggest if there is any other approach to resolve this issue.

InputAndOutpultSample.txt1 KB

acmohan023
New Contributor
5 years ago
Team,

Can you please help and let me know if any details required.
- acmohan023
  New Contributor
  5 years ago
  @rsramkoski ,
  
  can you please guide to resolve this issue.
RogerSramkoski
Employee
5 years ago
@acmohan023 - Thank you for providing the sample input and output. Since the post is about CPU and memory issues, could you please share a sanitized version of your pipelines as well. By sanitize, I mean removing values from any field that reveal SFTP hostnames, IP addresses, accounts, or other sensitive details - it’s just the overall logic I would like to see.

Forum Discussion

Parsing XML Data and formatting it

Recent Discussions

Pagination Logic Fails After Migrating from REST GET to HTTP Client Snap

Pipeline Execute Pool size

Concat values of a field based on value of another field

trace API and proxy calls

JWT Configuration for SnapLogic Public API