cancel
Showing results for 
Search instead for 
Did you mean: 

Comparing the schema and discard the Object

mohit_jain
New Contributor

Hi Team,

I have one scenario and want to solve it as soon as possible.
I have two files : Initial Load , Incremental Load

We have loaded the initial load data into the DB which contains the columns like : id, name, age, class

mohit_jain_0-1691659876373.png

But We have the incremental files on the daily basis but there are some case we have the extra column which we have to discard the value and alert to the team that column is added newly How we can do in the pipeline .
Incremental file columns:

mohit_jain_1-1691660036331.png

We have to give alert that this column is roll_number is added  and also discard the columns. We cant do manually because it is a daily job and me can user mapper in this case. 

Can anyone help me in this case.

Thanks in advance.
@koryknick @dmiller 

1 ACCEPTED SOLUTION

@mohit_jain the solution is very straight forward.
1. Get the header from your initial file.
2. Merge it with your incremental file.
3. Filter match and unmatched in two different output streams.

Download solution pipeline Compare_Schema_Discard_load & below is the screenshot.

Screenshot 2023-08-17 131936.png

View solution in original post

14 REPLIES 14

@mohit_jain the solution is very straight forward.
1. Get the header from your initial file.
2. Merge it with your incremental file.
3. Filter match and unmatched in two different output streams.

Download solution pipeline Compare_Schema_Discard_load & below is the screenshot.

Screenshot 2023-08-17 131936.png

Thank you so much @virender_prajap 
It works for me.....  😊

Hi @virender 

Do we have can compare the json which is coming into the string also with the same scenario
We have to compare 2 things the col_name and the data type of the json and fetch out the unmatched value

Let me give you one example:-
Original Json :- It contains the col_name and the data_type of the column name
[{"metadata":"{col_name=ID_CODE, data_type=varchar(32768)},{col_name=ABC_CODE, data_type=varchar(32768)},{col_name=PSD_CODE, data_type=varchar(32768)},{col_name=NUM_REGIST, data_type=varchar(32768)},{col_name=PROD_ORIG_NAME1, data_type=varchar(32768)}"}]

And the updated one is 
[{"metadata":"{col_name=ID_CODE, data_type=varchar(32768)},{col_name=ABC_CODE, data_type=varchar(32768)},{col_name=PSD_CODE, data_type=varchar(32768)},{col_name=NUM_REGIST, data_type=varchar(32768)},{col_name=PROD_ORIG_NAME1, data_type=varchar(32768)},{col_name=PROD_ORIG_NAME2, data_type=varchar(32768)}"}]
Now here we are getting the {col_name=PROD_ORIG_NAME2, data_type=varchar(32768)} as the extra column and we have to fetch the out to unmacted value.
Can you please help me in this scenario also and please make this note we have to compaire both the col_name and data_type also.

Really thanks in advace.




 

darshthakkar
Valued Contributor

@mohit_jain: Would you always get data in the order A, B, C, D?

How is the data sorted? Is it alphabetical? Reason being, if you have columns as "Address", "Birth", "City", "Name" and for there is a new column named "Cameo" then where would it go? Would it after "Name" or after "Birth"?

 

A simple solution would be to use a mapper snap after your CSV parser and define the fields in the mapper snap OR define the fields in your CSV Parser and enabling the error route so that whenever new columns are detected, those will error out and you can send notifications to the team for that additional column.

 

darshthakkar_0-1692032648776.png

darshthakkar_2-1692032699939.png


darshthakkar_1-1692032679619.png

Thank you.

Hi @darshthakkar 

There are around 100 files we have to process we can't do it manually on the daily basis. As well as there are no data sorted we are comparing two schema with two different files.

Thanks 

Mohit Jain