cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Creating complex nested JSON from CSV file

adam_g
New Contributor II

I am reading a CSV/XLSX file (could be an sql table as well) with the follwing structure (file attached below as well):

BranchYearTypeParentNumberCurrentNumberStructureLevelLanguage1Language1_ValueLanguage2Language2_ValueCreationDateLastUpdate
12342024AAANULL12342024010000011NLvoornaamENfirst name05/06/202405/06/2024
12342024AAA123420240100001234202401010001.012NLachternaamENlast name05/06/202405/06/2024
12342024AAANULL12342024020000021NLgeboortedatumENbirthday05/06/202405/06/2024
12342024AAANULL12342024030000031NLgeslachtENgender05/06/202405/06/2024
12342024AAA123420240300001234202403010003.012NLopslaanENsave05/06/202405/06/2024
12342024AAA123420240301001234202403010103.01.013NLversturenENsend05/06/202405/06/2024
12342024AAA123420240301001234202403010203.01.023NLsluitenENclose05/06/202405/06/2024

I want to transform this table (could be thousands of rows) to the following JSON structure(file attached below as well):

{
"Branch": 1234,
"Year": 2024,
"Type": "AAA",
"MaxLevel": 3,
"Level1": [
{
"ParentNumber": "NULL",
"CurrentNumber": 12342024010000,
"Structure": "01",
"Level": 1,
"Translations": [
{
"Language1": "NL",
"Language1_Value": "voornaam"
},
{
"Language2": "EN",
"Language2_Value": "first name"
}
],
"LastUpdate": "2024-05-06",
"Level2": [
{
"ParentNumber": 12342024010000,
"CurrentNumber": 12342024010100,
"Structure": "01.01",
"Level": 2,
"Translations": [
{
"Language1": "NL",
"Language1_Value": "achternaam"
},
{
"Language2": "EN",
"Language2_Value": "last name"
}
],
"LastUpdate": "2024-05-06"
}
]
},
{
"ParentNumber": "NULL",
"CurrentNumber": 12342024020000,
"Structure": "02",
"Level": 1,
"Translations": [
{
"Language1": "NL",
"Language1_Value": "geboortedatum"
},
{
"Language2": "EN",
"Language2_Value": "birthday"
}
],
"LastUpdate": "2024-05-06"
},
{
"ParentNumber": "NULL",
"CurrentNumber": 12342024030000,
"Structure": "03",
"Level": 1,
"Translations": [
{
"Language1": "NL",
"Language1_Value": "geslacht"
},
{
"Language2": "EN",
"Language2_Value": "gender"
}
],
"LastUpdate": "2024-05-06",
"Level2": [
{
"ParentNumber": 12342024030000,
"CurrentNumber": 12342024030100,
"Structure": "03.01",
"Level": 2,
"Translations": [
{
"Language1": "NL",
"Language1_Value": "opslaan"
},
{
"Language2": "EN",
"Language2_Value": "save"
}
],
"LastUpdate": "2024-05-06",
"Level3": [
{
"ParentNumber": 12342024030100,
"CurrentNumber": 12342024030101,
"Structure": "03.01.01",
"Level": 3,
"Translations": [
{
"Language1": "NL",
"Language1_Value": "versturen"
},
{
"Language2": "EN",
"Language2_Value": "send"
}
],
"LastUpdate": "2024-05-06"
},
{
"ParentNumber": 12342024030100,
"CurrentNumber": 12342024030102,
"Structure": "03.01.02",
"Level": 3,
"Translations": [
{
"Language1": "NL",
"Language1_Value": "sluiten"
},
{
"Language2": "EN",
"Language2_Value": "close"
}
],
"LastUpdate": "2024-05-06"
}
]
}
]
}
],
"CreationDate": "2024-05-06",
"LastUpdate": "2024-05-06"
}

Is this even possible in the first place?

Can someone help me out?

Thank you!

1 ACCEPTED SOLUTION

koryknick
Employee
Employee

@adam_g - since there is no defined max level, the solution uses some advanced methodology and I will do my best to explain in some detail.  First, you should familiarize yourself with Expression Libraries, Arrow Functions, jsonPath, the Spread and Ternary Operators, Object Functions, and Array Functions such as map() and filter().  These are advanced topics that may take you some time to feel comfortable with, and I will at least explain each of the functions that I used in the example pipeline that I have attached.

Please download the zip and decompress it.  Upload the parentChild.expr into your SLDB into the same project where you import the pipeline.

The first snap after the Excel Parser is Group By Fields which simply pivots the data from individual records to being grouped by the specified complex primary key defined in the settings.

Next is the Mapper, which first moves the $groupBy fields to the root level.  Then the "MaxLevel" is determined by using the Math.max() function against the array of "Level" values.  Note the use of the Spread Operator to explode the array into individual values for the method to work from.  jsonPath is used here to find all the "Level" values found within the array that was just created by the Group By Fields snap.  Array.map() is also used to convert the "Level" from string to integer using the built-in parseInt() function.

The "Level1" value is where some real magic happens.  Since we don't know how many levels deep we need to build, we need to use a bit of recursion to make it work.  This is where the expression library and arrow functions help us out. Note that the call is pretty simple, just calling the mapLevel() function in the library, passing it the "Root" array (built from the Group By Fields snap), the starting level number, and the "ParentNumber" that we want to start from - which in this case are those coming in as "NULL".  Be sure to include the parentChild.expr file in your SLDB in the same project where you import the pipeline - this expr file is where all the logic lies to build your JSON.

  mapLevel : (rootArray, level, pNum) =>

This simply defines the function "mapLevel" with 3 parameters, which we saw when we called it in the Mapper.

    [].concat(rootArray)

The first thing I want to do is make a copy of the array that is passed in.  Since we need to filter out only the parts of the array we're interested in based on ParentNumber, I don't want to accidentally affect my original array since I need to look through it multiple times for the different values we need to build.

.filter(elem=> typeof elem == 'object' && elem.ParentNumber == pNum)

This is the filter to find only the elements related to the ParentNumber we're interested in.

.map((elem,idx,arr)=> 

Now we are going to rebuild each array element (now filtered down to the ParentNumber we're working on) into the structure you defined.

.extend(

This is the Object.extend() method to add elements into the Object we just created for each array element. 

rootArray.findIndex(x=> x.ParentNumber == elem.CurrentNumber) < 0

We use the Array.findIndex() method to check to see if there is another level we need to dive into by checking to see if there are any elements in the original array where the ParentNumber is the same as the CurrentNumber value of the filtered array we're currently working in.

: { ["Level" + (level + 1)] : __root__.mapLevel (rootArray, level + 1, elem.CurrentNumber) }

Finally, here is the real magic of this library... recursion!  Thanks to the Array.findIndex() and ternary operator, we know there is another level to dive into.  So we'll build that next level by calling the "mapLevel" function again, but this time passing the CurrentNumber value of the filtered array element we're working in.  The "__root__" designation just let's me access the function name again, see Object Literals for more information.

Hope this helps!

View solution in original post

6 REPLIES 6

adam_g
New Contributor II

Hi @koryknick,

It got a bit more complicated. The SQL database contains different Branch, Year and Type values. The user has the option to send the Branch, Year and Type value as parameters to filter the database. Branch and Year are optional, but the Type is mandatory and has to be given by the user in the parameter in all cases. If the user decides to only give the the Type for example, I have to return all Branch and Year combination JSON objects that have that Type value. For example, the user sends parameters Type="AAA", then this could be the result:

{
  {
    "Branch": 123,
    "Year": 2023,
    "Type": "AAA",
    ...
    "Level1": [...],
    "CreationDate": "2024-05-06",
    "LastUpdate": "2024-06-17"
  },
  {
    "Branch": 9,
    "Year": 2023,
    "Type": "AAA",
    ...
    "Level1": [...],
    "CreationDate": "2024-05-06",
    "LastUpdate": "2024-06-17"
  },
  {
    "Branch": 1,
    "Year": 2024,
    "Type": "AAA",
    ...
    "Level1": [...],
    "CreationDate": "2024-05-06",
    "LastUpdate": "2024-06-17"
  },
  {
    "Branch": 7,
    "Year": 2017,
    "Type": "AAA",
    ...
    "Level1": [...],
    "CreationDate": "2024-05-06",
    "LastUpdate": "2024-06-17"
  },
  ...
}

Same logic in case I get the other parameters as well. I basically have to filter them first and then for every Branch, Year combination, do the library expression mapping like before.

Any idea what I should add or change to achieve this? I was thinking of using a child pipeline, but how would I then give all the rows to the child pipeline...

Thank you so much in advance!

koryknick
Employee
Employee

@adam_g - it occurred to me over the weekend that this could also be solved using a JSON Generator snap and the Apache Velocity Engine syntax.  I don't have time to produce this example for you but I have tested some very basic concepts and am confident that it would work using a Velocimacro to handle the recursive nature of the data.

You could also try using a Script snap to construct your hierarchical JSON from the array as I had created it from the Group By Fields snap.  Here is an article that might help with that.  Keep in mind that you need to follow the template given by the Script snap, but the logic to create the complex structure should be usable.