Monitoring Pipeline Status with an External Scheduler

In our Organization, SnapLogic is one of many tools used to integrate data. We use an enterprise scheduler to manage the execution alerting, and dependencies across all platforms and technologies. We use Cisco’s Tidal Enterprise Schedule to execute SnapLogic Pipelines, SSIS Packages, Informatica Workflows, FTP File Movements, Command line executables, etc.

In order to expose a pipeline to an external scheduler, we create a triggered task and give the exposed API URL to the Webservice adapter within Tidal. Tidal will execute the pipeline and get a response of “200 - OK” because the pipeline task successfully triggered. This doesn’t tell us that the pipeline finished successfully, just that it kicked off successfully.

In order to catch failures, we use System Center Operations Manager to call the summary pipeline status API. It will return one or more failures that are then sent to our IT Operations team that will triage and notify responsible parties.

We’ve been running this way for a while and it’s been working well enough. Now we’re exposing SnapLogic to more projects and more development groups and as a result the demands on the successful executions and downstream dependencies have increased. We need our scheduler to know when jobs succeed, fail, or run long and we need each team to be notified of their own pipeline failures.

From here on I’m talking theory. I’m very interested in what others have come up with as a solution to enterprise scheduling

Since the only response we get back to the scheduler in a REST API call, is 200 - OK, we can’t rely on this to determine whether the job was successful or not. SnapLogic has published a set of APIs to return the given status of an individual pipeline. If we can get our scheduler to be dependent on the status of a subsequent status call, then we should be able to alert accordingly.

To accomplish this, I’m attempting to implement the following (haven’t connected all the dots yet):

  1. Add a mapper to each parent pipeline that has an open output and returns the URL used to monitor this pipeline (+pipeline.ruuid)
  2. Create a tidal job (a) to call the initial pipeline task that will do the actual integration.
  3. Create a tidal job (b) that is dependent on (a)'s success that will call the monitoring URL returned from (a) repeatedly at a short interval and logs the return code to a Tidal variable.
  4. If (b) returns “Running”, keep trying. If (b) returns “Failed”, fail the job. If (b) returns success, mark job as successful.
  5. Create tidal © that is the next actual integration that is dependent on both the success of (b) and a value of “Success” in the tidal variable.

This is quite a bit of tedium just to handle the success of failure of a job and I’ve not yet successfully implemented this solution, I feel like it’s with reach.

What solutions have other come up with for managing dependency and alerting across your enterprise?

3 Likes

Brett
The practice of adding an independent mapper snap with unterminated output returning the pipeline runtime ID (or indeed the full URL) for the host Scheduler is currently best practice. This will give also the indication that the pipeline itself has prepared and started successfully.
Craig

This is what we have seen and implemented in field

Work Flow

· Schedule a JOB (triggered task ran as a scheduled job via external job utility, ex: controlM) and get the run_id (shown below)
· Pass it to the monitoring JOB as HTTP QueryString Param
· For multiple JOB’s, follow the same process.

Actual JOB
https://elastic.snaplogic.com/api/1/rest/slsched/feed/someORG/Test%20pipeline%20Task?bearer_token=UaDb8qnExYtloAbo6PcryU12X6JSEuWa

Sample Response
[
{“pipeline_name”:“Test pipeline”,“run_id”:"**a2890a41-5dc9-4f48-80cd-453fcb25ba0b",“status”:“Started”}
]

In your task add a mapper with an open outputview and the mapper need to have 2 properties, you can just add it as an idepenpendent snap in a pipeline.

pipe.ruuid returns the runtime uuid and set status = ‘started’

Monitoring JOB
https://elastic.snaplogic.com/api/1/rest/slsched/feed/SomeORG/pl_check_pipeline_execution_status%20task?bearer_token=7aG0YP0OpnY2AaOCSUwju4XVMOJfJ4CU&run_id=a487e390-296e-4c97-9417-28611d82ddd2

Sample Response
[
{“run_id”:“a487e390-296e-4c97-9417-28611d82ddd2”,“status”:“Completed”}
]

More details on monitoring api provided by SnapLogic

To get pipeline status:

Ex: you can invoke this URL to get status of all pipelines running in an org, in this case ConnectFasterInc

GET call
https://elastic.snaplogic.com/api/1/rest/public/runtime/ConnectFasterInc?state=Completed,Failed,Started

state param takes a comma separated list, valid values are [NoUpdate, Prepared, Started, Queued, Stopped, Stopping, Completed, Failed]. (case sensitive)

this then in turn returns JSON that as details of relevant pipelines , from there you filter (mapper) out pipe_id (unique ruuui) and pass it onto another call to stop/start or get error logs associated with this pipeline, another way of getting pipe ruuid.

POST call
https://elastic.snaplogic.com/api/1/rest/public/runtime/stop/ConnectFasterInc/pipe_id

pipe_id is a variable that contains run_id of a given pipeline

Both of these calls would require you to pass your elastic uname/pwd as http basic auth

More details-http://doc.snaplogic.com/monitoring-api

@brettdorsey, your use case is kind of complex for me to follow since I’m unfamiliar with your scheduling/monitoring tools, but if you use the Exit snap strategically within your pipeline, it will return an HTTP response of 500 instead of 200.