cancel
Showing results for 
Search instead for 
Did you mean: 

Pipeline shows success on last retry in log but failure in dashboard

vincenr
New Contributor III

Berfore I dive into the probelm we are having, I will briefly summarize our overall process to import data into our data warehouse. We generally use a separate pipeline to do extracts, with output to .json files and then a separate pipeline to process the json and load data into staging tables in our database. From there, we run a script that runs a stored procedure to load from staging into our EDW.

We have a very strange problem. We discovered one of the tasks for an extract pipeline had the pipeline incorrectly set so that it was running a different pipeline than intended. So the extract worked fine, but of course, the load to our staging db failed.

However, the console showed four failures (1 original and 3 retries, which is how our script is setup). But in the log file for the load pipeline, which our script writes to, it shows the one original failure, two retry failures and success on the last one!

I did notice a timing delay before our script reruns the curl command, which is 15 sec. Is that enough time to assure we don’t try to reinvoke the snaplogic pipeline too soon, so that there are no problems? If it was to be executed too soon again, would that possibly cause a “successful” run of the pipeline, in other words, a false positive?

There’s got to be a better way to detect if what’s being sent back indicates true success or is a false success.

Here’s the parent code that runs the curl command:

Re-run if failed

for retry_cnt in {1…3}
do
if [ “$RESULT” = “FAILED” ]
then
echo -e “Retry - Run $retry_cnt - Start: date +\"%D %T\"” >> $LOGFILE

 #Start New Temp Log
 echo "" > $TMPLOGFILE

 # Reset the process status for retry
 RESULT=SUCCESS

 #sleep 15
 sleep 50
 run_pipeline_task 

fi

done

Here’s the function called bye the code above:

image

UPDATE: I created a ‘test’ version of the script and set the delay to 5 instead of 50 or 15 and it seems to return a false result of ‘Ok’ aka SUCCESS more often.

I tested our process and captured the output from the curl command and there was absolutely no difference between a true success and the false success.

Here’s what I captured in my test version of the script (unfortunately, there was no difference):

FALSE SUCCESS

PIPELINE_OUTPUT: [

]

TRUE SUCCESS

PIPELINE_OUTPUT: [

]

Lastly, I’ve gone through the whole list of switches on the curl man page and didn’t find anything useful that might return more information about whether there was truly success on running of the pipeline.

1 REPLY 1

alchemiz
Contributor III

Hello… you can always get the runtimeid from the response header

Then base from this runtime id invoke the snaplogic public api

https://elastic.snaplogic.com/api/1/rest/public/runtime/{org}/{runtimeid}

Hope this helps