How to remove blank rows from an input flat file?

Hi Team,

How can I remove all the blank rows from an input flat file? I cannot use a condition restricted on a single column as the other records are needed. All I want to do is filter out ALL ROWS that doesn’t have any data (like Row 4, 5, 7, 8 shown in the screenshot below)

image

Desired Output:
image

Thus, final output should be:
image

Thanking in advance.

Regards,
DT

Hi @darshthakkar,

After parsing the file try using a filter with the following expression $ != {}.

Let me know if this helps,

BR,
Aleksandar.

1 Like

Sure @AleksandarAngelevski , I will give it a try,
I tried with isEmpty() both on filter and router but it didn’t work, was planning to put a condition on all the columns as the last resort (but that is not a good design to be honest).

This is not fetching the desired results.

This is maybe a little bit difficult, but you can put the following filter after your parser
The expression is: $.entries().filter(v => v[1] != "").length > 0 which will find any instance where all of the values for the given input document are all empty and filter them out.

1 Like

Hi @darshthakkar,

I think also it will do the trick

$.entries() != []

Thanks,
Pero M.

Thank you @ddellsperger, I will definitely give this a try.
Before posting this to community, I tried with $.filter((v,k)=> v != null) but didn’t get any success.

Will keep you posted with my findings. Thanks again.

Regards,
Darsh

Thank you @pmancevski for your suggestion and assistance on this one. I tested with your solution and unfortunately it didn’t work.

@ddellsperger: Voila, it worked! Thank you for your assistance. I will now remove my conventional filter which has the condition of ! = null on each column in the file (it was never a good design but my only resort until we could find a best solution).

Thanks again.

BR,
Darsh

1 Like

@ddellsperger: Is it a safe assumption that the below solution provided by you is similar to the conventional filter I used (which was putting a NOT NULL condition on each column?) Q-1

I believe my conventional filter would be a lot of overhead on the system + this time, I only had to deal with 10 columns (so 20 conditions, one as !=null and another as !=“”) however if I had to work on 350+ columns (which is my ideal day scenario), conventional filter would have busted.

I want to take this opportunity to learn from an expert like you that what would be the overhead (time & space complexity) while using $.entries().filter(v => v[1] != “”).length > 0? Q-2

Is the above a recommended approach and the ONLY approach to filter out blank rows from a flat file? Q-3

Looking forward to hearing from you.

BR,
Darsh

A-1
It’s safe to say it’s similar, you could achieve similar with the following filter directly on the object (as a bit of a double verification doing both null and empty string checking):
!$.filter((v, k) => v != null && v != "").isEmpty()

A-2
This space and time complexity is fairly low, you’re talking O(n) when it comes to this validation and verification as it will go through every piece in the input document to check for values, we do a filter and length (creates an array and provides us the length of that) to then check for the size, the other option above is similar in that route.

A-3
There are some other ways to potentially do this, but I wouldn’t really call them perfect solutions, your flat file is likely a CSV and as a result, there’s some limitations with the CSV Parser to get those empty rows removed, and I think that’s a place where the snap can potentially be improved (I’ll have a feature request logged internally to see if we can help out in this case). Some of our snaps have better support for this than others, I believe the Excel Parser has some option to exclude empty rows from the output, but that’s using a different way to parse, so it’s slightly skewed in how it operates.

1 Like

Thanks a TON :heart: for the detailed explanation @ddellsperger, really appreciate your time and help on this. I tried to put $.isEmpty() == false as well as !$.isEmpty() under router and filter but didn’t find any luck and it now makes sense as I was missing out a vital condition. Thanks for sharing that as well.

Agreed on the CSV limitations part, excel is much straightforward than this one. Another thing (this is off-topic) I’ve noticed especially with CSV file (CSV formatter though) is that when we go ahead UTF-8 binding under character set, the file generated is not good-to-use as it prints the data on the next line; the strange part was the length at which it used to break wasn’t consistent. I had a use case to generate csv files couple of months ago, it failed miserably so I went ahead with excel file and then converted it manually using UTF-8.

In addition to this, hyperlinks don’t work on CSV, even if you write transformations on snapLogic, the end result on the CSV would be a formala displayed as a “Text”.

Moroeve, I have also observed that certain .csv files that gets generated from snapLogic are larger than their potential .xlsx (I usually develop the pipelines where the backup files gets saved on a network drive and this is my observation from that). I can keep you posted with my observations particularly on CSV snaps offline, if I find more. Don’t want to spam this thread more and I apologize for going off-track on this one.

Thank you for logging a feature request, will keep an eye if there is any enhancement to the snap.

Thanks again, happy holidays and have a good one ahead :slight_smile:

BR,
Darsh

Just so you know, I tested this out and it works as well :+1:

Or… $.values().join(‘’).trim().length > 0

:smile:

I will try this as well and keep you posted whether it worked or not.
Thanks for your help though @alchemiz

Will JSON.stringify($) != “{}” works? I feel this is simpler.

@smudassir: I haven’t tried this but can surely give a try and keep you posted. Should the input stream be in any specific format?

BR,
Darsh

Just try on the data you had earlier.

Sure, will keep you posted after trying that solution. Thank you.