Dynamically parse the excel without having sheet index or sheet name

Hi,
Anyone aware of how do we parse the excel with multisheet without even knowing sheet index / sheet name with having certain number of records in each sheet.
I am having requirement to remove the email ids from first row of first sheet and rest of the sheets as is upload. Please let me know is there any other approach instead of parsing each sheet

Any help is appreciated.

1 Like

Hi @rashmi,

This maybe it’s not the best solution but you could try with recursion, and you will control it from “Excel Parser” snap from error view, you will wait/loop until you receive particular error - Sheet with the provided index does not exist:

Here are some links which can guide you when developing recursive pipelines:
https://community.snaplogic.com/t/recursive-pipeline-solution/6202
https://community.snaplogic.com/t/pipeline-execute-maximum-recursion-calls-depth/12245
https://community.snaplogic.com/t/recursive-logic-in-snaplogic/12921

Thanks,
Pero M.

Thank you @pmancevski for sharing those links. I believe that would be a lot to check on @rashmi’s end.
Instead, lemme loop in the experts who can help her on this one.

CC: @dmiller @koryknick @ptaylor @bojanvelevski @siwadon

@rashmi - this is a fairly complicated issue for a couple reasons. First, SnapLogic does not have a snap to read multiple sheets into a single stream. Second, the Excel Multi Sheet Formatter requires pre-determined, named views as input to write multiple sheets. So if you have input files with different number and names of sheets, there is no out of the box solution for you here.

This does not mean there is no hope; it’s just a more technical discussion. Excel files are basically zip archives, which you can read using the ZipFile Read snap and use a Binary Router snap to check the
$['content-location'] value for the first ‘xl/worksheets/*’ reference. From here, it will take some work on your part on how you can remove that string from the content and push it back with the rest of the Excel file content into the ZipFile Write to reconstitute the file.

This work is more than I can spend time to provide a working example for you here, but is an interesting problem. I hope you share your findings!

2 Likes

As is typical for me, I just couldn’t leave this one alone. Here is a solution I came up with based on this discussion. It simply removes the value from A1 of the first sheet.
Community 13302 - Excel Remove A1 from sheet1_2022_09_01.slp (9.0 KB)
Please replace the files with your own. Note that when you download, the output file may switch to .zip, so you just need to rename it so you can open it as an Excel file.

If you intend to use this, please test it thoroughly as I would consider this to be a Proof of Concept solution.

2 Likes

Thanks a lot @koryknick!! this approach is good but I was thinking is there any way to remove that row itself than keeping first row as empty. Please could you help i mean to say instead of replace fuction in mapper can we use some function to remove that row itself?

Thank you so much @darshthakkar,@pmancevski

2 Likes

@rashmi - I’ve looked into this a bit and actually removing the row is quite a bit more technical due to internal references that Excel uses within the Sheets. This level of technical complexity is outside the scope of what I can spend on a Community response. If someone else solves this, I would love to see the solution!

No problem @koryknick Thanks a lot for helping

@rashmi - Here is an updated pipeline that will remove the first row. The pipeline itself is a little less elegant now, but the concept is pretty simple. The top path is basically the original ZipFile Read to get the elements of the Excel file, then discards the sheet1 data. The second path that starts with the File Reader is using the Excel Parser to read the sheet1 contents, discards the first row, then creates a temporary Excel file with only the sheet1 data - then we read that temporary Excel file with the ZipFile Read snap and discard everything except the sheet1 data, then merge that with the top path to create the target Excel file.

Community 13302 - Excel Remove A1 from sheet1_2022_09_07.slp (24.5 KB)

This seems to work with my testing files, but note that the Excel Formatter does not support cell formatting (bold, italics, background color, cell spanning, etc.) so you would lose that in sheet1 with this method.

A couple things to note in this example:

  • Temporary file uses the pipe.tmpDir built-in property - I love this feature since it references a temporary directory that is automatically deleted when the pipeline ends
  • The “unused” or “garbage” paths of the Binary Routers are using a Script snap (Consume input) to consume the unwanted input documents so we don’t have any open output ports in the pipeline. This is technically only required when a pipeline is used as a Triggered Task or as a child pipeline called via Pipeline Execute from another pipeline, but it is a recommended best practice to close unused open output ports in your pipeline.