Forum Discussion

mohit_jain's avatar
mohit_jain
New Contributor III
3 years ago

Re: Comparing the schema and discard the Object

Hi Team,

Can anyone help to find the values if the schema is different.
let me explain with the example:
We have loaded the initial load into the table:
Initial Load contains the column like

Here you can see we have 4 columns ID, Name, Age, Class
Now we have the incremental load and it contains the columns like

Here you can see we have 5 columns ID, Name, Age, Class as well as new added Roll_Number

Now We have to discard the roll_number value and send the alter that this column has added newly in this file 
As well as we dont want to do this manually by using mapper we have to schedule the pipeline on this daily basis and there are around 100 files we have to process the data

Can anyone help to create the dynamic pipeline for all the files.
Thanks In advance

 

9 Replies

  • darshthakkar's avatar
    darshthakkar
    Valued Contributor

    mohit_jain : Does the value change everyday on the existing records? Rationale behind asking this is that you can go ahead with Truncate and Load in Snowflake Bulk Load snap.

    I had a similar use case wherein I was getting dynamic payload but that would ingest "Null" values to the existing records for the new columns and you wouldn't be able to troubleshoot on when the records were first ingested.

    So, we asked the business to define the payload (which happened for a few integrations) and for the rest, we are ingesting JSON to snowflake and then snowflake handles it dynamically, hope this helps!

    • mohit_jain's avatar
      mohit_jain
      New Contributor III

      Hi darshthakkar 

      We can not do the truncate load on the daily basis the only requirement is to find the extra column if we get in the future how we can handle it and send the notification that we got the extra column.

      Thanks 

  • darshthakkar's avatar
    darshthakkar
    Valued Contributor

    Got it. You'll have to use a couple of snaps for designing this.

    Have a defined payload coming from source, check for any additional columns (can be achieved via a filter/router snap). If there is no change, it can go to downstream system for ingestion.

    If there is a change, copy the output into 2 different stream, one for notifying the group a new column was found in the payload and the other for an ingestion to downstream in such a way that you're using an ALTER TABLE query to add a column on top of ingesting data for the existing columns.

    This use case will require a lot of testing from your side and can be achieved by parent-child relation pipeline as well. I would urge you to test it out without parent-child (if you've never used it before) and then replicate the functionality in parent-child so that you know where and how the data is flowing.

    Thanks.

    • mohit_jain's avatar
      mohit_jain
      New Contributor III

      Hi alchemiz 

      Thanks for you reply
      The only thing is here we are not getting headers from the pipeline parameter( There are no parent and child pipeline in this). We have a 2 files . 
      1. Initial file - which record is already dropped into the SFTP location.
      2. Incremental File - File will come on the daily basis.
      We have to compare both the files header and fetch the extra call out the load the unfetch column.
      Here I am attaching two files for you reference can you please perform on this file in the pipeline and fetch the extra column out.
      I am also attaching the pipeline for you reference.
      Initial Load:

      Incremental Load: 

      Here we have to discard the Column E and load the A, B, C, D column. Also Fetch the E column and give alert that there is E column extra in the file

      Pipeline Design :

      Thanks 

      Mohit Jain

       

       

       

      • mohit_jain's avatar
        mohit_jain
        New Contributor III

        Hi alchemiz 
        Do you have got any chance to look into my reply.
        Thanks 
        Mohit jain