Unable to do error view writing as the error record is not showing in preview

Hi All,

I am trying to process millions of records even sometimes in billions. While validation, I am not seeing preview coming up for error view as we have 2000 limitation in preview. When I execute the pipeline, I am getting failure saying few incorrect field values or record incorrect out of the billions. I want to write them down using error view. But, as we have limitation on preview during validation, I couldn’t pass them by selection in subsequent snaps after error view.

My objective is to write down all the error records with header, reason and resolution in a file for further analysis. Is there a way to do that as my error records are after 2000 preview limitation and I could have options to select required fields before validation.?

Please help.

Regards,
Amar

Hi @amardeep2021,

You can always use an SLDB file for debugging. Add a JSON Formatter to the error view, and File Writer to write the file on SLDB.

After the file is stored, you can use it to construct a solution for storing the errors in a different format/location (replace the file writing part)

Hope this helps,
Bojan

Hi Amar,

Good day, you can also use the Data Validator snap where you can set specific rules and even pattern

Thanks,
EmEm

As the size of the file is bigger, storing in snaplogic may not be possible. As alternate, I am trying to identify non numeric values coming in numeric fields in upstream documents and try to separate to address the non numeric records for a particular field to fix this. I tried filter with isNaN($a) == ‘true’, it didn’t work. Any suggestion? PFA of snaps I used.

Regards,
Amar

That’s why I suggested to use the error file and construct a solution that will write the errors elsewhere. You can try and use the instanceof or typeof operators in the filter:

$a instanceof Number

typeof $a == ‘number’

Or modify your actual expression:

isNaN(parseInt($a)) == true

What you were trying so far is not working because you are checking if the $a field is NaN. NaN is a response if you’re trying to parse a value that is Not a Number - NaN. That’s why you need to try and parse the value, than if the value is string or date or whatever, than you’ll get ‘NaN’ and your expression will be true. But be careful, the string numbers will also be affected, that’s why you should probably go with the operators above.

or you can also isNaN($.get(‘a’,‘isnull’))

when isNaN(null) will return false

Thanks all. I am trying to write only the records which has non numeric value for a by using below snaps and the filter condition as isNaN(parseInt($a)) == true. Pipeline completes successfully. Dashboard status showing filter input in 90 million and output as 30k. Since I have added a comment in the mapper after filter snap, I see the file with 30k rows with that comment. But I don’t see any other fields passed to the final file except the comment. I tried pass through and also pulled fields manually in mapper after filter, still same. Can anyone know the reason and possible resolution for this? Kindly help.

Regards,
Amar.

One additional information, column headers are writing to the final file with the comment I have added in the mapper but no values written for the columns for all 30k rows. Any idea to fix this? Please help.

Regards,
Amar.

Try and validate if values are getting to the filter. Additionally, it would be helpful if you can take a screen shot of the maper after the filter.

Hi Bojan,

I could see the input output count in dashboard as attached below. Same number of rows I can see in final file with header, empty values and stat column with value as ageerror. PFA of mapper screenshot. Tried pass though as well, same result.


Regards,
Amar

Can you do one more, from the CSV formatter?

Hi Boja,

PFA of csv formatter.

Does anyone know the reason for not getting the column values in the output file? Anyone knows how to resolve this glitch?

Any tips? Please help.

It seems possible to me that you have 30K empty rows in your input. If $a is an empty string, then your expression isNaN(parseInt($a)) == true will evaluate to true, thus passing the empty row to the output.

Try adding a Filter snap (either after each of your CSV parsers or prior to Filter1) with the following Filter expression: ((a) => a.length)(jsonPath($,"$.*").join().replace(/,/g,"")) != 0

This filter should remove any empty rows before being processed downstream.

Expression and pipeline working fine and generates file with all records and column values for opposite one isNaN(parseInt($a)) == false

But for the negative scenario it’s creating file with header and only comment column with value for 90 million records.

Does anyone know why this is happening only when I use this expression isNaN(parseInt($a)) == true the filter as I have not changed anything else?

Above sent setting screenshots are same. Kindly help me to resolve this as this is very high priority for me.

Regards,
Amar.

Do you by any means receive only numbers, and string version of numbers from the $a field? Ex.: 1 or “1”?

I supposed to receive only number in this field. When I try loading to redshift, I see error below. So decided to separate the non numeric records to correct them with above pipeline. It process but not values in final file. To answer your question, I am seeing data like below after union json preview.

Screen Shot 2021-11-12 at 10.50.30 AM

Redshift bulk load error view message
Error loading document 5658362 : Column age : Error code 1207 : "Invalid digit, Value ‘/’, Pos 2, Type: Integer ";Error loading document 5658364 : Column age : Error code 1207 : "Invalid digit, Value ‘/’, Pos 2, Type: Integer ";Error loading document 5658366 : Column age : Error code 1207 : "Invalid digit, Value ‘/’, Pos 2, Type: Integer ";Error loading document 5658368 : Column age : Error code 1207 : "Invalid digit, Value ‘/’, Pos 2, Type: Integer ";Error loading document 5658370 : Column age : Error code 1207 : "Invalid digit, Value ‘/’, Pos 2, Type: Integer ";

Well, this is a string. Should this value pass the filter or not? Because with your current expression, this will be considered as a number.

You mean to say, I need to check for $age == null? As this field getting number in string and for some it’s special character(no specific pattern). I just need to filter records which don’t carry number as strin in that case. Any suggestions?