Help with DIFF SNAP

Hi,

I am looking for some help regarding DIFF Snap.
Below is the scenario, I am working on:

I have two Source DBs - A & B. DIFF snap is being used to identify the eligible data for Insert, Update and Delete, and accordingly operations performed in Source B as Target DB.
e.g.
Lets say 10 records are there in A
20 records are there in B
then DIFF will identify 10 records for deletion from B.

Now, the problem I am facing is,
if due to any data issue or connection failure, if no records are coming from Source A, then DIFF is identifying all records from B to flow in the deletion link, which eventually deleting all the data from B.

I dont want this to happen. I want my pipeline to stop/fail, if any such error happend in the source, so that target cannot be empty in any case.
I am using Error Pipeline, which is tracking the error fine but then not stopping the pipeline.

Is there a way, I can track the error and stop the pipeline in such scenarios of connection failure or Data Issue ?

Quick Help will be really appreciated.

Thanks in Advance,
Payal Srivastava

Hi @PayalS

You can try by splitting the pipeline in two pipelines(parent/child).

In the parent pipeline you can add a logic where you will check if there are records coming from Source A or not(ex:count if number of records is bigger than 0).

In the child pipeline you can add the DIFF snap together with Source B.

If there are no records from Source A then do not call the child pipeline(ex: by using Filter Snap that will not proceed pipeline execution and call child pipeline if there are no records on input).

This is just an idea that you can try and check if it works for you.

Regards,
Spiro Taleski

Thanks Spiro for the quick reply. Much Appreciated. :pray:

Yeah, I too thought the same approach, but then in our project we do have a total count of 70 pipelines (which are already created and we identified this bug later). Now, with this change, we will have to modify all 70 pipelines and along with that we will have to create another set of 70 pipelines.

Therefore, I was thinking if we could have any alternate approach to fix the same. May be by just modifying the error pipeline or may a small fix which can be done only to already created 70 pipelines.

Thanks @PayalS

The other approach that I was thinking is replacing the DIFF snap and implementing custom logic for identifying which data should be inserted, updated or deleted.

But, I assume that this approach will require big changes on the existing pipelines as well.

Regards,
Spiro Taleski

Thanks @Spiro_Taleski
True, that will again require big changes in all 70 pipelines :frowning:

You can examine the error details in your Error Pipeline and call the Exit snap immediately if it is a connection error.

Another option may be to un-check the “Ignore empty result” and put a Router that looks for an empty result set. If empty, send to the Exit snap; otherwise process as usual.

I understand it’s updating 70 pipelines, but it would be a very consistent and quick fix.

Thnx @koryknick for your reply. Much appreciated :pray:

I tried this by explicitly filtering the connection errors, after inserting in the Oracle Snap, and putting a exit snap after this. But, this is only stopping that flow, not stopping the parent pipeline because of which Source B continuing sending the data to DIFF and thus deleting all the data from B.

@koryknick
This is already unchecked in my design but still not working, as its again stopping that link from Source A. But again, Source B continuing sending the data to DIFF and ultimately to delete link, thus causing emptying B. :frowning:

I tried multiple approaches, but not getting quick fix to it. I ultimately want to send a STOP signal to my pipeline in case of any source connection issue, but after logging the error in the error table.

The problem is going to be the logging. The formatter and file writer will wait until the end of your input stream, meaning that all documents will be processed by all input datastreams before it closes.

Depending on which formatter you’re using, you may be able to “short out” the logging. For example, if you are using a JSON Formatter, you can use the “Format each document” option, which create a file for each document and allow you to complete logging for the one record and immediately call the Exit snap.

@koryknick
I am using error table to log the error. I tried using Oracle insert to log the error, then used filter for data connection specific error description and then used Exit snap, so that as soon as any data connection error logged into the table it will reach to exit causing pipeline to stop.
But, this also doesn’t seems to be working as EXIT is not causing pipeline to fail. Its waiting for entire stream to get processed and by that time, records from source B reaching to DIFF and thus making it to the deletion flow :slightly_frowning_face: Not accomplishing my task :cry:

Try using an Oracle Account that has batch size set to 1. I think you’re running into record batching, which is great for making large volume database updates more efficient, but doesn’t allow you to fail quickly in this case.

Thnx @koryknick
I tried this as well. This was the very recent thing I did. I made the batch size to 1 thinking it will reach to EXIT snap making my pipeline stop. But, unfortunately, this also dint work. I dont know if I am missing anything or Snaplogic is behaving weird. Because, in the logs, I could see record is flowing to the exit snap, but still not stopping the pipeline :cry:

@koryknick @Spiro_Taleski
Thanks for your valuable comments :slight_smile: :pray:
I just wanna share the good news that I am finally able to crack this now.
PFB workarounds I did to make it a success:

  1. Updated Error pipeline to insert records in the DB.
  2. filtered data connection specific errors.
  3. Used exit snap with threshold value as ‘0’.
  4. Created a separate account with batch size as 1 for this error pipeline.

By this, my error pipeline will stop the parent pipeline only for particular errors, and will continue the pipeline in all other data failures.

Thanks,
Payal Srivastava

1 Like

Well done @PayalS! I’m sorry that my advice didn’t get you to this sooner, but I’m very glad for you that you found such a workable solution. And that you didn’t need to update 70 pipelines!

1 Like