JSON Splitter for an Split column

Hi,
I have a requirement to read multiple XML files data and load into Oracle table. I am using Directory browser to read the list of XML files and passing the path to File reader and then XML Parser, from XML parser the output is given to mapper there i am splitting the data on a column using xx.split(’\n’). The output from mapper is given to JSON Splitter to split on xxx, this is working good for one XML file, if i try to process for multiple XML files the JSON split is not working as excepted.

image

Even i tried by creating child pipeline and passing path value, but i am getting same results. it is splitting one XML properly and other XML with only one record.

Please some one can advice here.

As per my understanding, the Directory browser will execute one file at a time, please correct me if i am wrong.

Hi @pranil2k2

The Directory Browser lists all the files from the specific location based on the given criteria, and then the File Reader will process one file at a time.

One possible reason why JSON Splitter is working only for one file it can be that the other XML files does not have the same structure as that file(which the JSON Splitter works properly for), or that “xx”(split) column does not exist in the other files, or it may be empty string or null.

You should check/handle those scenarios before the splitting logic :slight_smile: .

Regards,
Spiro Taleski

1 Like

HI Taleski,

Thanks for the reply. to avoid those confusions/checks i have taken only two files that have complete data. i have executed the files individually and it is getting all results as shown below. Here the file list is stored in excel and act as input to directory browser and when trying to execute two XMLs the JSON splitter is not splitting as excepted, it is reading only one record from first XML. This is where i need advice… is reading file names from excel is not correct one? is something wrong here. please advice.
XML one


XML two

Reading two XMLs.


Please some one advice here…

Thanks

There’s an inherent ambiguity with converting XML to a JSON-like structure with regard to knowing which elements should be converted to an array. When there are multiple adjacent elements with the same name, that’s obviously an array. But in the case where there’s only one element with that name, should that be an array with one element, or just an object?

The key to solving this issue is to call the sl.ensureArray function in the Mapper right before the JSON Splitter. Map the path you’re using for the split expression to sl.ensureArray(path). Use that when you know that the path is supposed to represent an array. When the XML has more than one element for that path, the value passed to that function will already be an array, and the function will just return that array without modification. But when the output of the XML parser has a single object at that path, the function will wrap that object in an array and return that. So it will work in both the single- and multiple-element cases, because it ensures that the JSON Splitter will always operate on an array.

Hi Ptaylor,

As i mentioned above, i am applying split function on the xx column and that is generating an array. i am very sure the JSON is getting Array as input. i checked multiple times and as said above it is working if i execute the job for single file.

Thanks

I applied sl.ensureArray function and the column become an array, but still the problem exist as shown in above images

Note: The interesting part here is, i am passing file names to Directory browser which are saved in Excel file. if i remove excel file input and configure the Direcotry browser to read two files the problem is not coming the splitter is working as excepted. so is there any problem to pass file names from excel?

It’s not about whether there is one file or multiple files. The files will be parsed individually. It’s about what the data looks like in a single file. I’m guessing that things work fine when the data contains multiple elements at that path, since that will be parsed as an array, but fails when the data contains a single element at that path. You haven’t provided any detail about the failures – what errors are you seeing and what does the data look like for the failure cases?

Hi,

Ok, the data before JSON splitter… it has two rows
row1, row 2

The excepted output should be atleast more than one row in out put for two rows, but i am getting only one document as output for one of the xml

the excepted output is other columns also like version number,etc.
other xml is splitting as excepted.

As said earlier i didn’t find any issue if i execute pipeline for single files… i know as per logic it has to work because it is an array but i don’t know why it not splitting as excepted when i am trying to execute with multiple files…

Are you able to share some sample input files and an export of your pipeline?

It might help us understand what you’re trying to do and the issue you are running into.

i can provide the pipeline but the sample files may be data privacy issue.
EMA_Load_XML_Data_to_Table_2021_03_29.slp (16.5 KB)

as i explained, i have XML files, in XML we have a x column which need to split on /n (new line character), which i did it in mapper using split function. by default the output of split function is an Array (i also used sl.ensurearray function) and then i passed the column to JSON splitter. The splitter has to split array elements, but for multiple files the splitting is not happening as excepted. sorry, i cant share files is any other way i can share the screen?

Can you simplify the XML and mask the sensitive data? Without seeing something of the structure you are working with, I’m not sure how we might be able to help.

Thanks for the help. i didnt do any changes but now the JSON splitter is working as excepted. i just delete every thing and re created.
Thanks again