โ08-23-2022 01:46 PM
Berfore I dive into the probelm we are having, I will briefly summarize our overall process to import data into our data warehouse. We generally use a separate pipeline to do extracts, with output to .json files and then a separate pipeline to process the json and load data into staging tables in our database. From there, we run a script that runs a stored procedure to load from staging into our EDW.
We have a very strange problem. We discovered one of the tasks for an extract pipeline had the pipeline incorrectly set so that it was running a different pipeline than intended. So the extract worked fine, but of course, the load to our staging db failed.
However, the console showed four failures (1 original and 3 retries, which is how our script is setup). But in the log file for the load pipeline, which our script writes to, it shows the one original failure, two retry failures and success on the last one!
I did notice a timing delay before our script reruns the curl command, which is 15 sec. Is that enough time to assure we donโt try to reinvoke the snaplogic pipeline too soon, so that there are no problems? If it was to be executed too soon again, would that possibly cause a โsuccessfulโ run of the pipeline, in other words, a false positive?
Thereโs got to be a better way to detect if whatโs being sent back indicates true success or is a false success.
Hereโs the parent code that runs the curl command:
for retry_cnt in {1โฆ3}
do
if [ โ$RESULTโ = โFAILEDโ ]
then
echo -e โRetry - Run $retry_cnt - Start: date +\"%D %T\"
โ >> $LOGFILE
#Start New Temp Log
echo "" > $TMPLOGFILE
# Reset the process status for retry
RESULT=SUCCESS
#sleep 15
sleep 50
run_pipeline_task
fi
done
Hereโs the function called bye the code above:
UPDATE: I created a โtestโ version of the script and set the delay to 5 instead of 50 or 15 and it seems to return a false result of โOkโ aka SUCCESS more often.
I tested our process and captured the output from the curl command and there was absolutely no difference between a true success and the false success.
Hereโs what I captured in my test version of the script (unfortunately, there was no difference):
FALSE SUCCESS
PIPELINE_OUTPUT: [
]
TRUE SUCCESS
PIPELINE_OUTPUT: [
]
Lastly, Iโve gone through the whole list of switches on the curl man page and didnโt find anything useful that might return more information about whether there was truly success on running of the pipeline.
โ08-27-2022 02:36 PM
Helloโฆ you can always get the runtimeid from the response header
Then base from this runtime id invoke the snaplogic public api
https://elastic.snaplogic.com/api/1/rest/public/runtime/{org}/{runtimeid}
Hope this helps