cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Parsing XML Data and formatting it

acmohan023
New Contributor

Hi Team,

We have a requirement to parse the xml file (1.5GB) and transform/group the content based on the one of the field value and write write multiple files based on the each group.

=== Sample Input ==

<?xml version="1.0" encoding="UTF-8" ?>

test

test

Test1 Test2 Test1

===== output File 1===

<?xml version="1.0" encoding="UTF-8" ?>

test

test

Test1 Test1

===== output File 2===

<?xml version="1.0" encoding="UTF-8" ?>

test

test

Test2

I have tried using the xml parser , split based on child and add headers back. Problem here is as it is huge data CPU and memory are going high and getting Connection lost error.

Have also tried xslt but still got same issue.

Can you please help me to design the solution with memory optimization.

Thanks in advance.

11 REPLIES 11

viktor_n
Contributor II

Hi @acmohan023,

Maximum size of a file in SnapLogic acceptable is 100MB.

https://docs-snaplogic.atlassian.net/wiki/spaces/SD/pages/1439404/Files#:~:text=You%20can%20manage%2....

Regards,
Viktor

RogerSramkoski
Employee
Employee

@viktor_n The maximum file size of 100MB is only for files uploaded to SLDB (Files tab in Manager when looking at a project or project space). If youโ€™re using a File Reader, S3 File Reader, or something similar you can read larger files.

@acmohan023 Can you share screenshots of the pipeline and pipeline statistics? An example of the files or a clearer example of what youโ€™re trying to map would also be helpful. If I understand youโ€™re initial description correctly you may be doing a Group By N or Group By Field operations and copying to multiple targets, both of which will impact memory on the node.

Hi @viktor_n ,

Yes, I am aware of the file size in SLDB.Files are read from the SFTP and not from SnapLogic Manager.

@rsramkoski - Have attached the sample input structure and output structure for your reference. (I had pasted request and response structure and it is not visible).
InputAndOutpultSample.txt (1.3 KB)

As file Size is huge , tried to split the flow into multiple pipelines (to process it in chunk and release memory) :

  1. Read the file from SFTP and XML Parse to split the data . Group by N and call 2nd pipeline
  2. split group , by sort by field ( input field in attached file) call 3rd pipeline
  3. write files smaller chunk to local sftp
  4. On completion of all above 3 pipeline , read small chunk files based on file name and write consolidated file per group by field (input field in attached file)

Please suggest if there is any other approach to resolve this issue.