Feature Request: Capture Pipeline Data During Execution

Often when designing and debugging a Pipeline, the Validate function is not able to be used, as the Pipeline is only designed to be executed as a sub-Pipeline or as a Triggered Task. If we execute the parent Pipeline or trigger the task, then the full document data at each step in the Pipeline is not able to be browsed, we have to rely on the logs in the Dashboard.

Our workaround to this is to insert a Copy > JSON Formatter > File Writer series of Snaps at various points in the Pipeline. However the drawback to this is that it needs to be repeated at every relevant point in the Pipeline (almost after every Snap!), is messy to create / remove and creates a whole heap of files in SLDB which need to be searched and cleaned up.

We’ve requested the feature to be able to execute a Pipeline (triggered, scheduled or Pipeline Execute) with a ‘capture data’ setting which then allows the full input / output document data of each Snap to be browsed in the Dashboard (Service Request #18185).

I’d be keen to hear any thoughts or feedback from the community - is this a feature you would like to see too?

2 Likes

Quick question - are you only interested in the last n documents, for a fairly small value of n? Otherwise, this could get into the hundreds of megabytes or gigabytes in many cases, which is difficult to view in the browser.

Also, have you tried the Record Replay Snap? You can use this to capture, say, 10 documents at the beginning of a pipeline sent by the parent, and then have those docs available in that cache to use during validation. We often use it internally when connecting to arbitrary web endpoints.

Shayne

Thanks for your reply.

Yes, we would only want to capture one execution, or a small number - agree that we don’t want to be capturing huge volumes of data, and this should only be for debugging.

I haven’t tried the Record/Replay Snap yet, but I might have a play around with it.

However the key is being able to see the data at every step - often when debugging complex pipelines it can be difficult to understand the exact structure of the JSON at each step and why certain Snaps are failing. I wouldn’t want to have to put a Record/Replay between every Snap in my Pipeline - that seems like overkill!

This is something on our wishlist as well…oftentimes we are trying to debug as a particular document goes through the pipeline and see what the output is at certain steps. Right now, we’re inserting a Copy snap and then putting the other flow to a JSON Formatter and File Writer. It works, but it definitely isn’t ideal, especially if you’d want to see the output after a number of separate steps.

To clarify, is this during design time or is it when you are troubleshooting a failed execution and trying to make determinations there?

I typically use it to troubleshoot failed executions but I do also use it during design time, as we will use data backups in testing our designs out.

1 Like

While this request of enhancing is being looked into, is there a chance that we can call single pipeline, from multiple output terminals?

Like for example as mentioned in the post, we can create a copy > json formatter > file writer pipeline in order to record the input / output at multiple stages in pipeline. So we make an independent pipeline to do this and call it as execute pipeline in to trigger it. when we want this to be called say 10 times during the course of execution, we will need to have 10 execute pipeline snap, instead can we have one execute pipeline snap called multiple times?

This feature is specially useful when designing error handling mechanism.

/Krupali

As an aside, there is some work being done for error handling that we hope to have soon.