Copy snap creating output that is multiplied by number of records

I have a pipeline that creates files based upon 490 records in an Oracle table. I create the output by doing a select on static headings that I create and union it with actual records from the table and then I sftp the file. I wanted to also be able to email the file, so snapLogic support suggested adding a Copy snap and sending one to sftp and the other to email, so I added a Copy snap right after the Oracle Execute snap that then went to a CSV Formatter snap and then to a File Writer snap to sftp. The copy also went to a CSV Formatter and then a File Writer to create a file in my directory and finally an email snap to send the file. All of this worked EXCEPT that the listing I received was 491 records (which included a header) TIMES 490!!! For whatever reason, it repeated the copy 490 times!!! Any ideas as to why???

Hello @clarrivey, and welcome to the SnapLogic Community!

Can you share a screenshot of how you have your pipeline connected, or export the pipeline so I can look at it more closely?

Here is a screen shot (with the Oracle Execute right before it):


SL_Screenshot.docx (292.2 KB)

Copy Snap shouldn’t be the issue. It just duplicates the data to send to multiple endpoints. I see that Oracle Execute has input view. Can you send the slp file.

Also Dashboard is good place to monitor the records.

I agree @skatpally - next steps are to view the Dashboard statistics for the job so we can see the number of records into the Oracle Execute and number of records out, along with taking a closer look at the pipeline snap settings.

@clarrivey - if you can export your pipeline and upload the SLP file here, it will give us more of the information we need to help.

I agree - the copy SHOULD do no more than just that (but maybe it’s doing more???)!!!
I’m rather a newbie at this, so what is the slp file and how do I access/send it???

Connect_Ed_DEVL_2021_07_01.slp (64.9 KB)

@clarrivey - I’m assuming that the Oracle Insert is receiving multiple records. Further, if you look in the Dashboard, I would guess that you will see the Oracle - Execute Update SWEADTE snap is executing 490 times, and the Oracle - Execute3 is executing 490 times as well, pushing a great deal more data to the Copy and remaining downstream snaps than you are expecting.

My recommendation is that you either:

  1. Split this pipeline into two parts such that the first ends after the Oracle Insert snap
    -OR-
  2. Add a Tail snap after the Oracle Insert to wait for the upstream input to complete before moving on to the Oracle - Execute Update SWEADTE and other downstream snaps

Kory,
Here is the sql for the Oracle - Execute3:
select ‘ReferenceCode’||chr(09)||‘FirstName’||chr(09)||‘LastName’||chr(09)||‘Status’||chr(09)||‘Gender’||chr(09)||‘RefreshGroup’||chr(09)||‘ContactType’||chr(09)||‘HomePhone’||chr(09)||‘PrimaryPhone’||chr(09)||‘TTYPhone’||chr(09)||‘EmailAddress’||chr(09)||‘EmailAddressAlt’||chr(09)||‘SMSPhone’||chr(09)||‘Institution’
from sys.dual
union
select
rtrim(id)||chr(09)||
rtrim(fname)||chr(09)||
rtrim(lname)||chr(09)||
rtrim(ecls_code)||chr(09)||
rtrim(gender)||chr(09)||
rtrim(dept)||chr(09)||
rtrim(contact_type)||chr(09)||
rtrim(hr_phone)||chr(09)||
rtrim(primary_phone)||chr(09)||
rtrim(tty_phone)||chr(09)||
rtrim(ca_email)||chr(09)||
rtrim(primary_email)||chr(09)||
rtrim(sms_phone)||chr(09)||
‘WHEATON COLLEGE’
from sweadte
When I run it in sql (via TOAD), I get 491 output records. THAT is what I would expect to be in my Copy! I understand that Oracle - Execute Update SWEADTE snap is updating 490 records, but I don’t really understand what you mean that they are “pushing a great deal more data to the Copy…”? Please forgive my ignorance, but I thought the snaps executed in order (i.e. 2nd one didn’t run until 1st one was complete). It doesn’t make any sense to me to split the pipeline - it SHOULD be able to handle it! And I’m also not sure I fully understand the significance of a Tail snap, so I will have to look that up in the documentation.

One other quick question: in the View(s), you have things like input0 and output0. Can these stay the same throughout the pipeline, or do they need to be updated?

The view names are simply a way to reference the data as it streams - it doesn’t affect the data as it moves.

If you could provide a screenshot of the Pipeline Execution Statistics for this job, that might help me explain to you what is happening within the pipeline. You can get this either from Designer (if you are running manually), from this button in the toolbar:
Screen Shot 2021-07-01 at 10.09.31 AM

Or by clicking the Status text from Dashboard:
Screen Shot 2021-07-01 at 10.10.30 AM

SLPL.docx (1.2 MB)

It looks like the Oracle - Execute3 is taking the input 490 records and multiplying it by 491 (includes the header). But why?

So, when I add the Tail snap in, it only delivers 1 record - the header!

Interesting…when I pull it out of the pipeline and create a new pipeline, it works as it should:

Thank you for the pipeline statistics - that is helpful to explain. I see that you already added the Tail snap in, but it looks like you added it after the Oracle Execute 3 snap, which I believe you are using to generate your file output including the header. You should add the Tail immediately following the Oracle Insert snap.

Here is why: The pipeline is reading from Oracle and writing to “NBRJOBS_new.csv” file, then reading a file and performing Oracle Inserts (one document at a time). Keep in mind that the Oracle Insert snap passes the update information for EACH document to the output view… so 490 document in is 490 document out of that snap. This means that the Oracle - Execute Update SWEADTE snap is being executed 490 times (once for each input document), and I don’t believe this is what you want.

By adding the Tail snap before that Oracle - Execute Update SWEADTE snap, you are limiting the number of documents going into that snap to 1, so the snap will execute only once.

This is an exercise in understanding the behavior of each snap in the pipeline. The documentation of the snaps should help with that.

1 Like

Thank you Kory, that was VERY helpful. It is now working as expected. I am much obliged!

-Chris

@clarrivey – glad you finally solved the problem. I can’t tell from reading the above whether you ran afoul of one of Snaplogic’s fundamental “gotchas”, but one that I’ve seen happen with our team here is the danger of thinking of snaps as individual program steps instead of programs themselves.

Interestingly enough, this is a trap that experienced programmers seem to fall into more often than “citizen implementors” who don’t have a formal programming background.

It helps me to think of each snap as a separate program, each of which can be running in parallel, and the connection between each pair of snaps is a queue. A pile of documents get spit out of a query snap or other data-producing snap, and they conceptually sit in a queue as the next snap processes each in sequence and puts the output docs one by one into the queue for the snap that follows. It is a true pipeline and there can be different docs found in different parts of the pipeline being processed simultaneously.

I’ve found this mental model to be particularly helpful in avoiding the “rat in the snake” problem (lots of data in one big glob that can only get through the pipeline one snap at a time, when there’s no need for it to do so). It also helps me to reason about decisions in the pipeline where it actually does need to wait on all the data to arrive so it can aggregate things.

Hope This Helps!

1 Like