Forum Discussion

PayalS's avatar
PayalS
New Contributor II
6 years ago
Solved

Help with DIFF SNAP

Hi,

I am looking for some help regarding DIFF Snap.
Below is the scenario, I am working on:

I have two Source DBs - A & B. DIFF snap is being used to identify the eligible data for Insert, Update and Delete, and accordingly operations performed in Source B as Target DB.
e.g.
Lets say 10 records are there in A
20 records are there in B
then DIFF will identify 10 records for deletion from B.

Now, the problem I am facing is,
if due to any data issue or connection failure, if no records are coming from Source A, then DIFF is identifying all records from B to flow in the deletion link, which eventually deleting all the data from B.

I dont want this to happen. I want my pipeline to stop/fail, if any such error happend in the source, so that target cannot be empty in any case.
I am using Error Pipeline, which is tracking the error fine but then not stopping the pipeline.

Is there a way, I can track the error and stop the pipeline in such scenarios of connection failure or Data Issue ?

Quick Help will be really appreciated.

Thanks in Advance,
Payal Srivastava

  • @koryknick @Spiro_Taleski
    Thanks for your valuable comments šŸ™‚ šŸ™
    I just wanna share the good news that I am finally able to crack this now.
    PFB workarounds I did to make it a success:

    1. Updated Error pipeline to insert records in the DB.
    2. filtered data connection specific errors.
    3. Used exit snap with threshold value as ā€˜0’.
    4. Created a separate account with batch size as 1 for this error pipeline.

    By this, my error pipeline will stop the parent pipeline only for particular errors, and will continue the pipeline in all other data failures.

    Thanks,
    Payal Srivastava

14 Replies

  • Well done @PayalS! I’m sorry that my advice didn’t get you to this sooner, but I’m very glad for you that you found such a workable solution. And that you didn’t need to update 70 pipelines!

  • SpiroTaleski's avatar
    SpiroTaleski
    Valued Contributor

    Hi @PayalS

    You can try by splitting the pipeline in two pipelines(parent/child).

    In the parent pipeline you can add a logic where you will check if there are records coming from Source A or not(ex:count if number of records is bigger than 0).

    In the child pipeline you can add the DIFF snap together with Source B.

    If there are no records from Source A then do not call the child pipeline(ex: by using Filter Snap that will not proceed pipeline execution and call child pipeline if there are no records on input).

    This is just an idea that you can try and check if it works for you.

    Regards,
    Spiro Taleski

    • PayalS's avatar
      PayalS
      New Contributor II

      Thanks Spiro for the quick reply. Much Appreciated. šŸ™

      Yeah, I too thought the same approach, but then in our project we do have a total count of 70 pipelines (which are already created and we identified this bug later). Now, with this change, we will have to modify all 70 pipelines and along with that we will have to create another set of 70 pipelines.

      Therefore, I was thinking if we could have any alternate approach to fix the same. May be by just modifying the error pipeline or may a small fix which can be done only to already created 70 pipelines.

      • SpiroTaleski's avatar
        SpiroTaleski
        Valued Contributor

        Thanks @PayalS

        The other approach that I was thinking is replacing the DIFF snap and implementing custom logic for identifying which data should be inserted, updated or deleted.

        But, I assume that this approach will require big changes on the existing pipelines as well.

        Regards,
        Spiro Taleski

  • You can examine the error details in your Error Pipeline and call the Exit snap immediately if it is a connection error.

    Another option may be to un-check the ā€œIgnore empty resultā€ and put a Router that looks for an empty result set. If empty, send to the Exit snap; otherwise process as usual.

    I understand it’s updating 70 pipelines, but it would be a very consistent and quick fix.

    • PayalS's avatar
      PayalS
      New Contributor II

      Thnx @koryknick for your reply. Much appreciated šŸ™

      I tried this by explicitly filtering the connection errors, after inserting in the Oracle Snap, and putting a exit snap after this. But, this is only stopping that flow, not stopping the parent pipeline because of which Source B continuing sending the data to DIFF and thus deleting all the data from B.

    • PayalS's avatar
      PayalS
      New Contributor II

      @koryknick
      This is already unchecked in my design but still not working, as its again stopping that link from Source A. But again, Source B continuing sending the data to DIFF and ultimately to delete link, thus causing emptying B. ā˜¹ļø

      I tried multiple approaches, but not getting quick fix to it. I ultimately want to send a STOP signal to my pipeline in case of any source connection issue, but after logging the error in the error table.

      • koryknick's avatar
        koryknick
        Employee

        The problem is going to be the logging. The formatter and file writer will wait until the end of your input stream, meaning that all documents will be processed by all input datastreams before it closes.

        Depending on which formatter you’re using, you may be able to ā€œshort outā€ the logging. For example, if you are using a JSON Formatter, you can use the ā€œFormat each documentā€ option, which create a file for each document and allow you to complete logging for the one record and immediately call the Exit snap.