How to pass a variable along the pipeline?

I have a pipeline that looks like this:

The Snaps are:

  1. List all files in a given Box directory
  2. Filter the files according to a mask
  3. Read the matching Excel file into the pipeline
  4. Convert the first Worksheet to CSV
  5. Write a CSV file to an archive folder in Box
  6. Delete the original Excel file

Everything is working up to step 5, where I need to access the output variables of step 2 again, the FileName is needed to create the matching file name in the archive folder (except this time with a .csv extension) and then the FileId is needed in step 6 in order to delete the Excel file.

The business reason behind doing this is that the file coming in to the Box folder will have a date prefix at the front of it. There is no set frequency for this file, it is practically arriving at random intervals. Therefore, I have no way to hard code the actual file name into the Pipeline and must dynamically check for a new file every day.

How can I store the output fields from step 2 somewhere in memory so that the final two Snaps in the pipeline can access them?

image

Note that you are combining ALL of the Excel files found in the directory into a single CSV file. Is that really your intention here?

You’ll need to move part of the pipeline into a child pipeline and pass the file name as a parameter to the child. I’m going to guess you probably don’t want to combine all of the Excel files into one file, but rather have one CSV file per input file. Using a child pipeline will fix that issue as well.

So, I would move the “Read XLSX” and following snaps into the child pipeline and add a filename parameter. Then you can place a PipelineExecute snap after the Filter to kick off the child and pass down the file name from document output by the Box Directory Browser.

Thanks for the quick reply.

No, the mask applied picks up only one file. I understand that I might need to handle multiple files in future, but that’s unrelated to my question here.

Just to clarify what you’re saying here, the only way to accomplish this is with a Child Pipeline and a PipelineExecute Snap. There is no way to pass the variable “up” into the Pipeline’s “session”.

Is that correct?

Correct, there is no way to pass a variable “up”. To elaborate on that a bit, the snaps all run in parallel, so there would be a race between the snap that is passing the variable “up” and the snap that is trying to read the variable. For example, if there were two files coming in, the Write snap might see the first file name in one execution and the second file name in another execution.

I can see how that would happen, but could that not be controlled with a ForEach Snap that serializes the processing?

The use case you are thinking of would require a child pipeline due to the race condition, however it is not a requirement for the pipeline I am developing as (a) there will only be one file, and (b) it is acceptable to stop processing and throw an error if there is more than one file.

Hmm, I’m not quite sure what you mean here. The PipeExec snap basically obsoletes the ForEach snap.

That’s not going to be true in the general case. So, we can’t add something so error prone and hope that people only use it in the right situations.

I believe that this would be very helpful if added to the “Parameters and Fields” documentation.

Thanks @tstack for the suggestion. To be honest I find this quite a common requirement that is very complicated to implement. Parent/Child pipelines are much harder to debug, and it is strangely inconsistent. The Box Read snap outputs the “content location” i.e. the filename, but the CSV/Excel Parser snaps drops everything passed to it, except the file contents. A viable solution could be an option for the CSV/Excel Parser to include the originating filename in its output, much like Alteryx does

This is another use case that is effectively asking for this feature. The parallel issue comes up again, of course, but that doesn’t negate the functional requirement.