Forum Discussion

deepak_shaw's avatar
deepak_shaw
Contributor
4 years ago
Solved

CSV to XML with repetitive xml nodes

Hi,
I’m trying to create a pipeline which suppose to create an xml file out of a CSV file.

CSV file looks like -
|ProductCode|ProductName|Price|Country|
|A001|DellLaptop|$400|US|
|A001|DellLaptop|$500|AU|
|A001|DellLaptop|$550|EU|

XML output should be -

<?xml version="1.0" encoding="UTF-8"?> US $400 AU $500 EU $550 [XML file attached as its not coming well in the question well..] Thanks for your answer, Deepak Shaw [Product.zip|attachment](upload://7NZiA8X09vabLf1eK3iLKeLDU5f.zip) (203 Bytes)
  • robin's avatar
    robin
    4 years ago

    Just a small change in your Mapper needed:

    That will output JSON like:

    [{
    	"Invoice": {
    		"@xmlns:cac": "urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2",
    		"@xmlns:cbc": "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2",
    		"@xmlns": "urn:oasis:names:specification:ubl:schema:xsd:Invoice-2",
    		"cbc:CustomizationID": "urn:cen.eu:en16931:2017#conformant#urn:fdc:peppol.eu:2017:poacc:billing:international:aunz:3.0",
    		"cbc:ProfileID": "urn:fdc:peppol.eu:2017:poacc:billing:01:1.0",
    		"cbc:ID": "PRP100418",
    		"cbc:IssueDate": "29/04/2022",
    		"cbc:DueDate": "6/05/2022",
    		"cbc:InvoiceTypeCode": "380",
    		"cbc:Note": "110535 - DAF - XRef607 ICT Services - CRM Developer",
    		"cbc:DocumentCurrencyCode": "AUD",
    		"InvoiceLines": [{
    			"LineNumber": "1",
    			"Tax": "700"
    		}, {
    			"LineNumber": "2",
    			"Tax": "700"
    		}],
    		"ABNValue": "ABN 31 100 103 268"
    	}
    }]
    

    which the XML Formatter will turn into:

    <?xml version="1.0" encoding="UTF-8"?>
    <Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2">
       <cbc:CustomizationID>urn:cen.eu:en16931:2017#conformant#urn:fdc:peppol.eu:2017:poacc:billing:international:aunz:3.0</cbc:CustomizationID>
       <cbc:ProfileID>urn:fdc:peppol.eu:2017:poacc:billing:01:1.0</cbc:ProfileID>
       <cbc:ID>PRP100418</cbc:ID>
       <cbc:IssueDate>29/04/2022</cbc:IssueDate>
       <cbc:DueDate>6/05/2022</cbc:DueDate>
       <cbc:InvoiceTypeCode>380</cbc:InvoiceTypeCode>
       <cbc:Note>110535 - DAF - XRef607 ICT Services - CRM Developer</cbc:Note>
       <cbc:DocumentCurrencyCode>AUD</cbc:DocumentCurrencyCode>
       <InvoiceLines>
          <LineNumber>1</LineNumber>
          <Tax>700</Tax>
       </InvoiceLines>
       <InvoiceLines>
          <LineNumber>2</LineNumber>
          <Tax>700</Tax>
       </InvoiceLines>
       <ABNValue>ABN 31 100 103 268</ABNValue>
    </Invoice>
    

    which is what I believe you wanted.

19 Replies

    • deepak_shaw's avatar
      deepak_shaw
      Contributor

      Hi @marjan.karafiloski ,
      Thanks for your reply.
      The output is coming like:

      <?xml version='1.0' encoding='UTF-8'?>
      <DocumentRoot>
          <Document>
              <Metadata>
                  <asMap>
                      <global>
                          <doc_id>ae803607-d70c-11ec-9afc-6bf587c0d9c9</doc_id>
                      </global>
                  </asMap>
                  <id>ae803607-d70c-11ec-9afc-6bf587c0d9c9</id>
              </Metadata>
              <Data>
                  <xml>&lt;?xml version="1.0" encoding="UTF-8"?>&lt;Products>&lt;Product>&lt;ProductCode>A001&lt;/ProductCode>&lt;ProductName>DellLaptop&lt;/ProductName>&lt;Price>$400&lt;/Price>&lt;Country>US&lt;/Country>&lt;/Product>&lt;Product>&lt;ProductCode>A001&lt;/ProductCode>&lt;ProductName>DellLaptop&lt;/ProductName>&lt;Price>$500&lt;/Price>&lt;Country>AU&lt;/Country>&lt;/Product>&lt;Product>&lt;ProductCode>A001&lt;/ProductCode>&lt;ProductName>DellLaptop&lt;/ProductName>&lt;Price>$550&lt;/Price>&lt;Country>EU&lt;/Country>&lt;/Product>&lt;/Products></xml>
                  <original>
                      <products>
                          <ProductCode>A001</ProductCode>
                          <ProductName>DellLaptop</ProductName>
                          <Price>$400</Price>
                          <Country>US</Country>
                      </products>
                      <products>
                          <ProductCode>A001</ProductCode>
                          <ProductName>DellLaptop</ProductName>
                          <Price>$500</Price>
                          <Country>AU</Country>
                      </products>
                      <products>
                          <ProductCode>A001</ProductCode>
                          <ProductName>DellLaptop</ProductName>
                          <Price>$550</Price>
                          <Country>EU</Country>
                      </products>
                  </original>
              </Data>
          </Document>
      </DocumentRoot>
      

      With some additional tags, how to remove them?
      One more thing How I can introduce namespaces in output xml with the approch you have suggested. Sample xml with output is like:

      <?xml version="1.0" encoding="UTF-8"?>
      <ProductCode xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
          xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
          xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2">
      	<cbc:ProductName>
      		<cac:CostLocation>
      			<cbc:country>US</cbc:country>
      			<cbc:Cost>$400</cbc:Cost>
      		</cac:CostLocation>
      		<cac:CostLocation>
      			<cbc:country>AU</cbc:country>
      			<cbc:Cost>$500</cbc:Cost>
      		</cac:CostLocation>
      		<cac:CostLocation>
      			<cbc:country>EU</cbc:country>
      			<cbc:Cost>$550</cbc:Cost>
      		</cac:CostLocation>
      	</cbc:ProductName>
      </cbc:ProductCode>
      

      Looking forward for your suggestions.
      Regards,
      Deepak Shaw

      • robin's avatar
        robin
        Former Employee

        The Snap has a DocumentRoot value in the “Root Element” setting

  • robin's avatar
    robin
    Former Employee

    @marjan.karafiloski has provided a solution that leverages the Expression Language and the XML Generator and it results in the desired XML returned as a String in the output document. Thank you Marjan!

    An alternative way to doing this is to use the Gate Snap with an XML Formatter.

    But first…

    When going from CSV via JSON to XML, it’s often useful to first experiment with JSON Generator + XML Formatter to figure out what input JSON structure equates to the desired XML.

    In this case, to get XML that looks like this:

    <?xml version="1.0" encoding="UTF-8"?>
    <ProductCode>
       <ProductName>
          <CostLocation>
             <country>US</country>
             <Cost>$400</Cost>
          </CostLocation>
          <CostLocation>
             <country>AU</country>
             <Cost>$500</Cost>
          </CostLocation>
          <CostLocation>
             <country>EU</country>
             <Cost>$550</Cost>
          </CostLocation>
       </ProductName>
    </ProductCode>
    

    you need JSON that looks like this:

    [
        {
            "ProductCode": {
                "ProductName": {
                    "CostLocation": [
                        {
                            "country": "US",
                            "Cost": "$400"
                        },
                        {
                            "country": "AU",
                            "Cost": "$500"
                        },
                        {
                            "country": "EU",
                            "Cost": "$550"
                        }
                    ]
                }
            }
        }
    ]
    

    So your goal should be to gather your stream of multiple documents in a way that allows you get to that single JSON Document above.

    Breakdown:

    The first Snap is a Constant to simulate the original CSV (pipe-delimited and pipe-enclosed):

    Then a CSV Parser to convert it to Document-format and to use the pipe character as a delimiter:

    Now a Mapper to drop all the values we don’t care about (and rename the fields we do care about):

    Next a Gate Snap to gather the 3 documents into a single document:

    [{
    	"input0": [{
    		"country": "US",
    		"Cost": "$400"
    	}, {
    		"country": "AU",
    		"Cost": "$500"
    	}, {
    		"country": "EU",
    		"Cost": "$550"
    	}]
    }]
    

    Now another Mapper to structure the JSON so it can be converted to the XML we want:

    Finally a standard XML Formatter:

    And then you have your desired XML in a binary output format, ready to be written to a File etc

    <?xml version="1.0" encoding="UTF-8"?>
    <ProductCode>
       <ProductName>
          <CostLocation>
             <country>US</country>
             <Cost>$400</Cost>
          </CostLocation>
          <CostLocation>
             <country>AU</country>
             <Cost>$500</Cost>
          </CostLocation>
          <CostLocation>
             <country>EU</country>
             <Cost>$550</Cost>
          </CostLocation>
       </ProductName>
    </ProductCode>
    
    • deepak_shaw's avatar
      deepak_shaw
      Contributor

      Hi Robin,
      Great solution and almost no coding involved.
      Can you please help me know to introduce the namespaces into xml, the output looks like:

      <?xml version="1.0" encoding="UTF-8"?>
      <ProductCode xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
          xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
          xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2">
      	<cbc:ProductName>
      		<cac:CostLocation>
      			<cbc:country>US</cbc:country>
      			<cbc:Cost>$400</cbc:Cost>
      		</cac:CostLocation>
      		<cac:CostLocation>
      			<cbc:country>AU</cbc:country>
      			<cbc:Cost>$500</cbc:Cost>
      		</cac:CostLocation>
      		<cac:CostLocation>
      			<cbc:country>EU</cbc:country>
      			<cbc:Cost>$550</cbc:Cost>
      		</cac:CostLocation>
      	</cbc:ProductName>
      </cbc:ProductCode>
      

      Thanks,
      Deepak Shaw

      • robin's avatar
        robin
        Former Employee

        You’ll use our JSON-to-XML conventions, which are basically:

        {
            "root": {
                "elementName": {
                    "@attributeName": "attributeValue",
                    "$": "elementValue"
                },
                "repeatedElement": [
                    {
                        "key": {
                            "@attr": "attr1",
                            "$": "key1"
                        },
                        "value": "value1"
                    },
                    {
                        "key": {
                            "@attr": "attr2",
                            "$": "key2"
                        },
                        "value": "value2"
                    }
                ]
            }
        }
        

        That input Document into the XML Formatter will generate the following XML:

        <?xml version="1.0" encoding="UTF-8"?>
        <root>
           <elementName attributeName="attributeValue">elementValue</elementName>
           <repeatedElement>
              <key attr="attr1">key1</key>
              <value>value1</value>
           </repeatedElement>
           <repeatedElement>
              <key attr="attr2">key2</key>
              <value>value2</value>
           </repeatedElement>
        </root>
        

        So, for the sample I provided above, the following should give you enough information to get to your final desired XML:

        [
            {
                "ProductCode": {
                    "@xmlns:cac": "urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2",
                    "@xmlns:cbc": "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2",
                    "@xmlns": "urn:oasis:names:specification:ubl:schema:xsd:Invoice-2",
                    "cbc:ProductName": {
                        "cac:CostLocation": [
                            {
                                "cbc:country": "US",
                                "cbc:Cost": "$400"
                            },
                            {
                                "cbc:country": "AU",
                                "cbc:Cost": "$500"
                            },
                            {
                                "cbc:country": "EU",
                                "cbc:Cost": "$550"
                            }
                        ]
                    }
                }
            }
        ]
        

        which results in:

        <?xml version="1.0" encoding="UTF-8"?>
        <ProductCode xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2">
           <cbc:ProductName>
              <cac:CostLocation>
                 <cbc:country>US</cbc:country>
                 <cbc:Cost>$400</cbc:Cost>
              </cac:CostLocation>
              <cac:CostLocation>
                 <cbc:country>AU</cbc:country>
                 <cbc:Cost>$500</cbc:Cost>
              </cac:CostLocation>
              <cac:CostLocation>
                 <cbc:country>EU</cbc:country>
                 <cbc:Cost>$550</cbc:Cost>
              </cac:CostLocation>
           </cbc:ProductName>
        </ProductCode>
        

        Go from there to finish the rest of the XML you want.

  • Hi @deepak.shaw

    Could you please provide the elements’ structure of the XML file, so that we may be able to gain better insight as to which transformations and actions should be undertaken to resolve your particular use-case.

    Thanks,
    Dimitri

    • deepak_shaw's avatar
      deepak_shaw
      Contributor
      <?xml version="1.0" encoding="UTF-8"?>
      <ProductCode>
      	<ProductName>
      		<CostLocation>
      			<country>US</country>
      			<Cost>$400</Cost>
      		</CostLocation>
      		<CostLocation>
      			<country>AU</country>
      			<Cost>$500</Cost>
      		</CostLocation>
      		<CostLocation>
      			<country>EU</country>
      			<Cost>$550</Cost>
      		</CostLocation>
      	</ProductName>
      </ProductCode>
      
  • robin's avatar
    robin
    Former Employee

    I don’t believe your XML was attached correctly - please try again