Using sl.zipObject with Headerless CSV

Hoping someone might be able to find a solution to a challenge I have. I have a headerless csv file. I’m also reading in a header array that I wish to match up with the data in the headerless csv. I don’t want to add the header values to the CSV Parser snap so that I can avoid creating a custom parser for a multitude of headerless files that I might receive. Basically, here’s my situation.

“headers”: [“employeeId”, “firstName”, “lastName”]

The csv file looks like this after going through the CSV Parser.

“field1”: “1001”, “field2”: “Sylvester”, “field3”: “Stallone”
“field1”: “1005”, “field2”: “John”, “field3”: “Wick”
“field1”: “1010”, “field2”: “Arnold”, “field3”: “Schwarzenegger”

The results I desire should look like this:

“employeeId”: “1001”, “firstName”: “Sylvester”, “lastName”: “Stallone”
“employeeId”: “1005”, “firstName”: “John”, “lastName”: “Wick”
“employeeId”: “1010”, “firstName”: “Arnold”, “lastName”: “Schwarzenegger”

I’ve been playing around with sl.zipObject, but I can’t seem to make it work correctly. Hints, tips, and tricks would be greatly appreciated.

Thanks, Alex

HI @alex.panganiban.guild ,

Below is my pipeline that simulates your situation. Could it be that after the csv parser you map field1, field2, and field3 to your corresponding field names?

community_pipeline_V1.0_2022_11_23.slp (6.5 KB)

Let me know if this works for you

Regards

Jens

Thank you, Jens, for your suggestion. It’s certainly a valid solution, however, based on the needs of our application architecture, it won’t really fit in with our design pattern. We are moving away from using customized mappings, and having everything configurable instead. For this reason, I was focused on the sl.zipObject method to automatically align our data with configured headers. Here’s an example.

Say we have 2 customers that send us employee headerless data (we actually have more, which makes it even more important to develop a non-customized strategy that works for every customer). We don’t enforce any strict schemas or column ordering on our customers, so they can send the data elements in any order.

For customer ABC, we have a header configuration of “[employeeId, firstName, lastName, hireDate].” ABC sends their data like this, with the CSV parser results afterwards:

“1001”, “Sylvester”, “Stallone”, “2001-08-21”
“1010”, “Nicole”, “Kidman”, “1997-05-18”

“field001”: “1001”, “field002”: “Sylvester”, “field003”: “Stallone”, “field004”: “2001-08-21”
“field001”: “1010”, “field002”: “Nicole”, “field003”: “Kidman”, “field004”: “1997-05-18”

For customer XYZ, we have a header configuration of “[DOB_Date, First_Name, ID, Last_Name].” XYZ sends their data like this, with the CSV parser results afterwards:

“1960-07-04”, “Bobby”, “99999”, “Fischer”
“2005-04-27”, “Billie Jean”, “88888”, “King”

“field001”: “1960-07-04”, “field002”: “Bobby”, “field003”: “99999”, “field004”: “Fischer”
“field001”: “2005-04-27”, “field002”: “Billie Jean”, “field003”: “88888”, “field004”: “King”

Without having a create a separate parser with embedded header names for each customer, and without having to use a custom mapper for each customer, this is what I want to achieve for each respective customer.

ABC:

“employeeId”: “1001”, “firstName”: “Sylvester”, “lastName”: “Stallone”, “hireDate”: “2001-08-21”
“employeeId”: “1010”, “firstName”: “Nicole”, “lastName”: “Kidman”, “hireDate”: “1997-05-18”

XYZ:

“DOB_Date”: “1960-07-04”, “First_Name”: “Bobby”, “ID”: “99999”, “Last_Name”: “Fischer”
“DOB_Date”: “2005-04-27”, “First_Name”: “Billie Jean”, “ID”": “88888”, “Last_Name”: “King”

I’ve had partial success using ideas derived from Transforming JSON Data and JSON returns Column Names separately from the Rows - #5 by Garrett and using the sl.zipObject method, however, after going through the parser, my data has the field001, field002, etc. key tags on them, which is where I’m running into a wall. I feel if I could create an array of values for each row of my data, without key tags on them, then this might be the solution I’m looking for.

If I could make my post parser data transform from this:

“field001”: “1960-07-04”, “field002”: “Bobby”, “field003”: “99999”, “field004”: “Fischer”

to this, I feel like I could achieve my goal, because once I have the data array, the sl.zipObject method should work exactly as it does in the links I referenced above.

[“1960-07-04”, “Bobby”, “99999”, “Fischer”]

Anyways, thanks again Jens. Even though yours wasn’t the exact solution I was looking for, I truly do appreciate your time and generosity in responding to my plea for help. :slight_smile:

Alex

1 Like

I got my solution! @Spiro_Taleski, thank you so much for your wisdom! This is what Spiro showed me.

I applied his solution to the bigger challenge I was having and this sample pipeline does exactly what I needed it to do.

sample_ConfiguredHeadersToData_2022_11_23.slp (8.0 KB)

Thank you all!

Alex

1 Like