cancel
Showing results for 
Search instead for 
Did you mean: 

Dynamically parse the excel without having sheet index or sheet name

rashmi
New Contributor III

Hi,
Anyone aware of how do we parse the excel with multisheet without even knowing sheet index / sheet name with having certain number of records in each sheet.
I am having requirement to remove the email ids from first row of first sheet and rest of the sheets as is upload. Please let me know is there any other approach instead of parsing each sheet
excelsheetsample

Any help is appreciated.

11 REPLIES 11

koryknick
Employee
Employee

As is typical for me, I just couldn’t leave this one alone. Here is a solution I came up with based on this discussion. It simply removes the value from A1 of the first sheet.
Community 13302 - Excel Remove A1 from sheet1_2022_09_01.slp (9.0 KB)
Please replace the files with your own. Note that when you download, the output file may switch to .zip, so you just need to rename it so you can open it as an Excel file.

If you intend to use this, please test it thoroughly as I would consider this to be a Proof of Concept solution.

rashmi
New Contributor III

Thanks a lot @koryknick!! this approach is good but I was thinking is there any way to remove that row itself than keeping first row as empty. Please could you help i mean to say instead of replace fuction in mapper can we use some function to remove that row itself?

koryknick
Employee
Employee

@rashmi - I’ve looked into this a bit and actually removing the row is quite a bit more technical due to internal references that Excel uses within the Sheets. This level of technical complexity is outside the scope of what I can spend on a Community response. If someone else solves this, I would love to see the solution!

rashmi
New Contributor III

No problem @koryknick Thanks a lot for helping

koryknick
Employee
Employee

@rashmi - Here is an updated pipeline that will remove the first row. The pipeline itself is a little less elegant now, but the concept is pretty simple. The top path is basically the original ZipFile Read to get the elements of the Excel file, then discards the sheet1 data. The second path that starts with the File Reader is using the Excel Parser to read the sheet1 contents, discards the first row, then creates a temporary Excel file with only the sheet1 data - then we read that temporary Excel file with the ZipFile Read snap and discard everything except the sheet1 data, then merge that with the top path to create the target Excel file.

Community 13302 - Excel Remove A1 from sheet1_2022_09_07.slp (24.5 KB)

This seems to work with my testing files, but note that the Excel Formatter does not support cell formatting (bold, italics, background color, cell spanning, etc.) so you would lose that in sheet1 with this method.

A couple things to note in this example:

  • Temporary file uses the pipe.tmpDir built-in property - I love this feature since it references a temporary directory that is automatically deleted when the pipeline ends
  • The “unused” or “garbage” paths of the Binary Routers are using a Script snap (Consume input) to consume the unwanted input documents so we don’t have any open output ports in the pipeline. This is technically only required when a pipeline is used as a Triggered Task or as a child pipeline called via Pipeline Execute from another pipeline, but it is a recommended best practice to close unused open output ports in your pipeline.