Forum Discussion

Sahil's avatar
Sahil
Contributor
4 years ago
Solved

XML split in pipeline

I have a scenario where main xml needs to be splitted into multiple xmls.

Main xml:-

<?xml version="1.0" encoding="UTF-8"?>
<Order>
  <Main>
    <Version>2.1</Version>
    <SenderID>55544566</SenderID>
  </Main>
  <Data>
    <OrderNumber>564434555</OrderNumber>
    <OrderDate>2021-07-15</OrderDate>
    <CurrencyCode>eur</CurrencyCode>
  </Data>
  <Lines>
    <Line>
      <OrderLineNumber>10</OrderLineNumber>
      <VendorOrderLineNumber>10000</VendorOrderLineNumber>
      <VendorPartNumber>66554</VendorPartNumber>
   </Line>
	<Line>
      <OrderLineNumber>20</OrderLineNumber>
      <VendorOrderLineNumber>20000</VendorOrderLineNumber>
      <VendorPartNumber>6775</VendorPartNumber>
    </Line>
  </Lines>
</Order>

Split xml 1:-

<?xml version="1.0" encoding="UTF-8"?>
<Order>
  <Main>
    <Version>2.1</Version>
    <SenderID>55544566</SenderID>
  </Main>
  <Data>
    <OrderNumber>564434555</OrderNumber>
    <OrderDate>2021-07-15</OrderDate>
    <CurrencyCode>eur</CurrencyCode>
  </Data>
  <Lines>
    <Line>
      <OrderLineNumber>10</OrderLineNumber>
      <VendorOrderLineNumber>10000</VendorOrderLineNumber>
      <VendorPartNumber>66554</VendorPartNumber>
   </Line>
  </Lines>
</Order>

split xml 2:-

<?xml version="1.0" encoding="UTF-8"?>
<Order>
  <Main>
    <Version>2.1</Version>
    <SenderID>55544566</SenderID>
  </Main>
  <Data>
    <OrderNumber>564434555</OrderNumber>
    <OrderDate>2021-07-15</OrderDate>
    <CurrencyCode>eur</CurrencyCode>
  </Data>
  <Lines>
    <Line>
      <OrderLineNumber>20</OrderLineNumber>
      <VendorOrderLineNumber>20000</VendorOrderLineNumber>
      <VendorPartNumber>6775</VendorPartNumber>
    </Line>
  </Lines>
</Order>

how do I achieve it?

10 Replies

  • Supratim's avatar
    Supratim
    Contributor III

    @Sahil you can try the attached pipeline. You have to replace input xml mapper with File reader and binary to document snap. and change input value xml_multi_content to $content in add identify tag mapper snap.

    test_2021_07_28.slp (9.8 KB)

    • Sahil's avatar
      Sahil
      Contributor

      Hi, I realized that xml were not correctly formatted.
      I have edited my question.

      • Supratim's avatar
        Supratim
        Contributor III

        @Sahil Even now also you can use same pipeline, only replace your xml with my xml.

  • Hello @Sahil!

    I’m attaching two pipelines that I believe get you two separate files like you’re looking for. Here are the two files in Manager > MyProject > Files (you can rename them):

    Order_564434555_Line_20.xml looks like this:

    Order_564434555_Line_20.xml looks like this:

    The extra XML is the byproduct of moving it through the XML Formatter. If this is on the right track let us know and we can work next on getting that removed if needed.

    Here is the parent pipeline:

    The JSON Splitter is where the magic happens and here is how I configured it with the “Include Paths” section so Data and Main make it through to each document. The keys will be out of order, so the Mapper helps us rearrange all of it.

    If you use an XML Formatter and File Writer immediately after the Mapper, it writes one file. Since we want these to be two separate files I’m using a Pipeline Execute Snap for file writing with parameterized names.

    Here are the pipelines so you can check it out for yourself:
    Community_SplitXML_Parent_2021_07_29.slp (7.9 KB)
    Community_SplitXML_Child_2021_07_29.slp (3.9 KB)

  • Sahil's avatar
    Sahil
    Contributor

    Hi @rsramkoski, @Supratim ,

    Thank you for the example pipeline, we are on correct track.
    I tried the two above pipelines but there is an issue. As you could see from the above xml examples.

    1.extra XML is coming from <DocumentRoot> till <Data>, I am processing the splited xmls again so I would like the exact split xml as example.

    2.<Lines> and <Line> segment is missing from the output xml.
    The above Main xml is just an example, there is a chance that we get xml with just one segment and we do not need to split it at all. just making it clear if I was not clear earlier.

    Example
    Main XML:-

    <?xml version="1.0" encoding="UTF-8"?>
    <Order>
      <Main>
        <Version>2.1</Version>
        <SenderID>55544566</SenderID>
      </Main>
      <Data>
        <OrderNumber>564434555</OrderNumber>
        <OrderDate>2021-07-15</OrderDate>
        <CurrencyCode>eur</CurrencyCode>
      </Data>
      <Lines>
        <Line>
          <OrderLineNumber>10</OrderLineNumber>
          <VendorOrderLineNumber>10000</VendorOrderLineNumber>
          <VendorPartNumber>66554</VendorPartNumber>
    	  <Orgs>
            <Org>
              <Type>All</Type>
              <Quantity>2</Quantity>
            </Org>
          </Orgs>
          <Code>OK</Code>
        </Line>
    	<Line>
          <OrderLineNumber>20</OrderLineNumber>
          <VendorOrderLineNumber>20000</VendorOrderLineNumber>
          <VendorPartNumber>6775</VendorPartNumber>
    	  <Orgs>
            <Org>
              <Type>All</Type>
              <Quantity>3</Quantity>
            </Org>
          </Orgs>
          <Code>OK</Code>
        </Line>
      </Lines>
    </Order>
    

    Split XML 1:-

    <?xml version="1.0" encoding="UTF-8"?>
    <Order>
      <Main>
        <Version>2.1</Version>
        <SenderID>55544566</SenderID>
      </Main>
      <Data>
        <OrderNumber>564434555</OrderNumber>
        <OrderDate>2021-07-15</OrderDate>
        <CurrencyCode>eur</CurrencyCode>
      </Data>
      <Lines>
        <Line>
          <OrderLineNumber>10</OrderLineNumber>
          <VendorOrderLineNumber>10000</VendorOrderLineNumber>
          <VendorPartNumber>66554</VendorPartNumber>
    	  <Orgs>
            <Org>
              <Type>All</Type>
              <Quantity>2</Quantity>
            </Org>
          </Orgs>
          <Code>OK</Code>
        </Line>
      </Lines>
    </Order>
    

    Split XML 2:-

    <?xml version="1.0" encoding="UTF-8"?>
    <Order>
      <Main>
        <Version>2.1</Version>
        <SenderID>55544566</SenderID>
      </Main>
      <Data>
        <OrderNumber>564434555</OrderNumber>
        <OrderDate>2021-07-15</OrderDate>
        <CurrencyCode>eur</CurrencyCode>
      </Data>
      <Lines>
        <Line>
          <OrderLineNumber>20</OrderLineNumber>
          <VendorOrderLineNumber>20000</VendorOrderLineNumber>
          <VendorPartNumber>6775</VendorPartNumber>
    	  <Orgs>
            <Org>
              <Type>All</Type>
              <Quantity>3</Quantity>
            </Org>
          </Orgs>
          <Code>OK</Code>
        </Line>
      </Lines>
    </Order>
    
  • viktor_n's avatar
    viktor_n
    Contributor II

    Hi @Sahil,

    I am attaching pipelines which I think will solve your problem.
    Pipelines are similar to the pipelines of @rsramkoski with a little changes.

    Here are the pipelines:
    Split_XML_Parent_2021_07_31.slp (9.8 KB)
    Split_XML_Child_2021_07_31.slp (4.0 KB)

    1. The reason why you was getting different structure of the XML document was in XML Formatter snap if you not remove DocumentRoot value from the Root Element field, the snap automatically will add root element to prevent failing the process if came more than one document on input. I removed the value from Root Element and with that the snap will not add another root element on top of the current one.

    2. The data from $Lines.Line was moved on root because of the Splitter snap and here in the Mapper after the Splitter I wrote this expression(You also if you want can map the fields without the expression).
      $Order.extend({Lines: {Line: $.filter((x,k) => k != "Order")}})

    • If you not know what the expression is doing, here is the explanation:
      It will extend the Order object with the Lines.Line object and inside will add all fields from root except of Order itself.

    Here is also example of how it looks like one file on output:

    <?xml version='1.0' encoding='UTF-8'?>
    <Order>
        <Main>
            <Version>2.1</Version>
            <SenderID>55544566</SenderID>
        </Main>
        <Data>
            <OrderNumber>564434555</OrderNumber>
            <OrderDate>2021-07-15</OrderDate>
            <CurrencyCode>eur</CurrencyCode>
        </Data>
        <Lines>
            <Line>
                <OrderLineNumber>20</OrderLineNumber>
                <VendorOrderLineNumber>20000</VendorOrderLineNumber>
                <VendorPartNumber>6775</VendorPartNumber>
                <Orgs>
                    <Org>
                        <Type>All</Type>
                        <Quantity>3</Quantity>
                    </Org>
                </Orgs>
                <Code>OK</Code>
            </Line>
        </Lines>
    </Order>
    
    • Sahil's avatar
      Sahil
      Contributor

      Hi,
      Its working for xml with multiple <Line> segment but any xml which has just one <Line> segment it is not working.
      So, in reality there is a chance that we get batch or single xml and it should just pass the xml in case of single xml without splitting.