cancel
Showing results for 
Search instead for 
Did you mean: 

Creating complex nested JSON from CSV file

adam_g
New Contributor II

I am reading a CSV/XLSX file (could be an sql table as well) with the follwing structure (file attached below as well):

BranchYearTypeParentNumberCurrentNumberStructureLevelLanguage1Language1_ValueLanguage2Language2_ValueCreationDateLastUpdate
12342024AAANULL12342024010000011NLvoornaamENfirst name05/06/202405/06/2024
12342024AAA123420240100001234202401010001.012NLachternaamENlast name05/06/202405/06/2024
12342024AAANULL12342024020000021NLgeboortedatumENbirthday05/06/202405/06/2024
12342024AAANULL12342024030000031NLgeslachtENgender05/06/202405/06/2024
12342024AAA123420240300001234202403010003.012NLopslaanENsave05/06/202405/06/2024
12342024AAA123420240301001234202403010103.01.013NLversturenENsend05/06/202405/06/2024
12342024AAA123420240301001234202403010203.01.023NLsluitenENclose05/06/202405/06/2024

I want to transform this table (could be thousands of rows) to the following JSON structure(file attached below as well):

{
"Branch": 1234,
"Year": 2024,
"Type": "AAA",
"MaxLevel": 3,
"Level1": [
{
"ParentNumber": "NULL",
"CurrentNumber": 12342024010000,
"Structure": "01",
"Level": 1,
"Translations": [
{
"Language1": "NL",
"Language1_Value": "voornaam"
},
{
"Language2": "EN",
"Language2_Value": "first name"
}
],
"LastUpdate": "2024-05-06",
"Level2": [
{
"ParentNumber": 12342024010000,
"CurrentNumber": 12342024010100,
"Structure": "01.01",
"Level": 2,
"Translations": [
{
"Language1": "NL",
"Language1_Value": "achternaam"
},
{
"Language2": "EN",
"Language2_Value": "last name"
}
],
"LastUpdate": "2024-05-06"
}
]
},
{
"ParentNumber": "NULL",
"CurrentNumber": 12342024020000,
"Structure": "02",
"Level": 1,
"Translations": [
{
"Language1": "NL",
"Language1_Value": "geboortedatum"
},
{
"Language2": "EN",
"Language2_Value": "birthday"
}
],
"LastUpdate": "2024-05-06"
},
{
"ParentNumber": "NULL",
"CurrentNumber": 12342024030000,
"Structure": "03",
"Level": 1,
"Translations": [
{
"Language1": "NL",
"Language1_Value": "geslacht"
},
{
"Language2": "EN",
"Language2_Value": "gender"
}
],
"LastUpdate": "2024-05-06",
"Level2": [
{
"ParentNumber": 12342024030000,
"CurrentNumber": 12342024030100,
"Structure": "03.01",
"Level": 2,
"Translations": [
{
"Language1": "NL",
"Language1_Value": "opslaan"
},
{
"Language2": "EN",
"Language2_Value": "save"
}
],
"LastUpdate": "2024-05-06",
"Level3": [
{
"ParentNumber": 12342024030100,
"CurrentNumber": 12342024030101,
"Structure": "03.01.01",
"Level": 3,
"Translations": [
{
"Language1": "NL",
"Language1_Value": "versturen"
},
{
"Language2": "EN",
"Language2_Value": "send"
}
],
"LastUpdate": "2024-05-06"
},
{
"ParentNumber": 12342024030100,
"CurrentNumber": 12342024030102,
"Structure": "03.01.02",
"Level": 3,
"Translations": [
{
"Language1": "NL",
"Language1_Value": "sluiten"
},
{
"Language2": "EN",
"Language2_Value": "close"
}
],
"LastUpdate": "2024-05-06"
}
]
}
]
}
],
"CreationDate": "2024-05-06",
"LastUpdate": "2024-05-06"
}

Is this even possible in the first place?

Can someone help me out?

Thank you!

1 ACCEPTED SOLUTION

koryknick
Employee
Employee

@adam_g - since there is no defined max level, the solution uses some advanced methodology and I will do my best to explain in some detail.  First, you should familiarize yourself with Expression Libraries, Arrow Functions, jsonPath, the Spread and Ternary Operators, Object Functions, and Array Functions such as map() and filter().  These are advanced topics that may take you some time to feel comfortable with, and I will at least explain each of the functions that I used in the example pipeline that I have attached.

Please download the zip and decompress it.  Upload the parentChild.expr into your SLDB into the same project where you import the pipeline.

The first snap after the Excel Parser is Group By Fields which simply pivots the data from individual records to being grouped by the specified complex primary key defined in the settings.

Next is the Mapper, which first moves the $groupBy fields to the root level.  Then the "MaxLevel" is determined by using the Math.max() function against the array of "Level" values.  Note the use of the Spread Operator to explode the array into individual values for the method to work from.  jsonPath is used here to find all the "Level" values found within the array that was just created by the Group By Fields snap.  Array.map() is also used to convert the "Level" from string to integer using the built-in parseInt() function.

The "Level1" value is where some real magic happens.  Since we don't know how many levels deep we need to build, we need to use a bit of recursion to make it work.  This is where the expression library and arrow functions help us out. Note that the call is pretty simple, just calling the mapLevel() function in the library, passing it the "Root" array (built from the Group By Fields snap), the starting level number, and the "ParentNumber" that we want to start from - which in this case are those coming in as "NULL".  Be sure to include the parentChild.expr file in your SLDB in the same project where you import the pipeline - this expr file is where all the logic lies to build your JSON.

  mapLevel : (rootArray, level, pNum) =>

This simply defines the function "mapLevel" with 3 parameters, which we saw when we called it in the Mapper.

    [].concat(rootArray)

The first thing I want to do is make a copy of the array that is passed in.  Since we need to filter out only the parts of the array we're interested in based on ParentNumber, I don't want to accidentally affect my original array since I need to look through it multiple times for the different values we need to build.

.filter(elem=> typeof elem == 'object' && elem.ParentNumber == pNum)

This is the filter to find only the elements related to the ParentNumber we're interested in.

.map((elem,idx,arr)=> 

Now we are going to rebuild each array element (now filtered down to the ParentNumber we're working on) into the structure you defined.

.extend(

This is the Object.extend() method to add elements into the Object we just created for each array element. 

rootArray.findIndex(x=> x.ParentNumber == elem.CurrentNumber) < 0

We use the Array.findIndex() method to check to see if there is another level we need to dive into by checking to see if there are any elements in the original array where the ParentNumber is the same as the CurrentNumber value of the filtered array we're currently working in.

: { ["Level" + (level + 1)] : __root__.mapLevel (rootArray, level + 1, elem.CurrentNumber) }

Finally, here is the real magic of this library... recursion!  Thanks to the Array.findIndex() and ternary operator, we know there is another level to dive into.  So we'll build that next level by calling the "mapLevel" function again, but this time passing the CurrentNumber value of the filtered array element we're working in.  The "__root__" designation just let's me access the function name again, see Object Literals for more information.

Hope this helps!

View solution in original post

5 REPLIES 5

koryknick
Employee
Employee

@adam_g - couple questions to clarify the requirements as it will affect how this needs to be solved.

  1. How many levels are possible?  If it always maxes out at 3, the solution is a bit simpler - if it is completely arbitrary and could extend into the dozens, then it is a more complicated solution
  2. Will Branch, Year, and Type always be the key components of your record sets?

Happy Friday!  Hopefully we can get you a solution quickly!

adam_g
New Contributor II

Hi @koryknick,
Thank you for your commenting!

  1. Unfortunately, MaxLevel is not pre-set nor is there a hard limit. It is simply the maximum value of the Level column from the CSV/SQL table.
  2. Yes, all fields in the JSON and the SQL table will always be present. The JSON structure, including fields at the root level or deeper levels, will remain consistent. The only variation is the depth of the levels or whether a row contains a deeper level.

I hope this answered your questions. English is not my native language, it could be that I misunderstood the 2nd question.

Happy Friday to you too!

koryknick
Employee
Employee

@adam_g - since there is no defined max level, the solution uses some advanced methodology and I will do my best to explain in some detail.  First, you should familiarize yourself with Expression Libraries, Arrow Functions, jsonPath, the Spread and Ternary Operators, Object Functions, and Array Functions such as map() and filter().  These are advanced topics that may take you some time to feel comfortable with, and I will at least explain each of the functions that I used in the example pipeline that I have attached.

Please download the zip and decompress it.  Upload the parentChild.expr into your SLDB into the same project where you import the pipeline.

The first snap after the Excel Parser is Group By Fields which simply pivots the data from individual records to being grouped by the specified complex primary key defined in the settings.

Next is the Mapper, which first moves the $groupBy fields to the root level.  Then the "MaxLevel" is determined by using the Math.max() function against the array of "Level" values.  Note the use of the Spread Operator to explode the array into individual values for the method to work from.  jsonPath is used here to find all the "Level" values found within the array that was just created by the Group By Fields snap.  Array.map() is also used to convert the "Level" from string to integer using the built-in parseInt() function.

The "Level1" value is where some real magic happens.  Since we don't know how many levels deep we need to build, we need to use a bit of recursion to make it work.  This is where the expression library and arrow functions help us out. Note that the call is pretty simple, just calling the mapLevel() function in the library, passing it the "Root" array (built from the Group By Fields snap), the starting level number, and the "ParentNumber" that we want to start from - which in this case are those coming in as "NULL".  Be sure to include the parentChild.expr file in your SLDB in the same project where you import the pipeline - this expr file is where all the logic lies to build your JSON.

  mapLevel : (rootArray, level, pNum) =>

This simply defines the function "mapLevel" with 3 parameters, which we saw when we called it in the Mapper.

    [].concat(rootArray)

The first thing I want to do is make a copy of the array that is passed in.  Since we need to filter out only the parts of the array we're interested in based on ParentNumber, I don't want to accidentally affect my original array since I need to look through it multiple times for the different values we need to build.

.filter(elem=> typeof elem == 'object' && elem.ParentNumber == pNum)

This is the filter to find only the elements related to the ParentNumber we're interested in.

.map((elem,idx,arr)=> 

Now we are going to rebuild each array element (now filtered down to the ParentNumber we're working on) into the structure you defined.

.extend(

This is the Object.extend() method to add elements into the Object we just created for each array element. 

rootArray.findIndex(x=> x.ParentNumber == elem.CurrentNumber) < 0

We use the Array.findIndex() method to check to see if there is another level we need to dive into by checking to see if there are any elements in the original array where the ParentNumber is the same as the CurrentNumber value of the filtered array we're currently working in.

: { ["Level" + (level + 1)] : __root__.mapLevel (rootArray, level + 1, elem.CurrentNumber) }

Finally, here is the real magic of this library... recursion!  Thanks to the Array.findIndex() and ternary operator, we know there is another level to dive into.  So we'll build that next level by calling the "mapLevel" function again, but this time passing the CurrentNumber value of the filtered array element we're working in.  The "__root__" designation just let's me access the function name again, see Object Literals for more information.

Hope this helps!

adam_g
New Contributor II

Hi @koryknick,

This is absolutely amazing! Thank you so much! I never would have figured this out on my own.

Your explanation is super helpful. I'll dive into all the links you shared.

Cheers!