10-26-2021 02:39 PM
Hi All,
I am trying to load a new redshift table which I have created. While validation, I am getting
“error_code”: 1207
“error_message”: "Invalid digit, Value ‘.’, Pos 9, Type: Integer "
“column_value”: "595546349.04 "
I have tried with DOUBLE PRECISION, REAL to alter the column data type of this redshift table, still getting above error. Also used parseFloat() before using redshift for this field, not working and same error. Does anyone know how to address this? Please let me know.
Regards,
Amar
10-27-2021 02:23 PM
You still have this issue, but I now can’t reproduce the problem against a real or double field! I may have misspoken and have since tore down the pipeline that I’m sure was getting a 1207 validation error for a number with a trailing space trying to insert into a real field. Let’s back up.
I’ve attached a pipeline with just two snaps, a JSON Generate and Redshift BulkLoad.
Here’s the payload for the generator; notice trailing space in number value.
[
{
"id": 1,
"bigdecimal": "595546349.04 "
}
]
Let’s create the Redshift table to bulk load into. Observe number data type is an integer, so we expect problems.
CREATE TABLE t1207 (
id int,
bigdecimal integer
);
Sure enough, the BulkLoad snap reports this error. We expects this.
[
{
"error": "Invalid digit, Value '.', Pos 9, Type: Integer",
"status_code": "error",
"line_number": 1,
"column_name": "bigdecimal",
"error_code": 1207,
"column_value": "595546349.04",
...
}
]
The type in the error message, Integer, indicates the field we’re trying to insert into. Ok, let’s drop table and re-create with real data type.
CREATE TABLE t1207 (
id int,
bigdecimal real
);
Re-running the pipeline succeeds without having to trim the number value, which might make better sense, because you’re still having a problem after doing this indicating problem is something else.
[
{
"id": 1,
"bigdecimal": 5.9554637E+8
}
]
However, we have lost some precision. This is a big number. So, let’s drop table and re-create as double precision which is probably what we want.
CREATE TABLE t1207 (
id int,
bigdecimal double precision
);
Pipeline succeeds and full precision is preserved.
[
{
"id": 1,
"bigdecimal": 595546349.04
}
]
I tried changing JDBC drivers from Postgres to Redshift and get same result with either. I thought maybe Redshift driver was handling space trimming, but apparently this isn’t a problem with either.
As I mentioned, BulkLoad first writes data to a CSV file, zips it, and then stores this in S3 for Redshift to source from. My Redshift account and BulkLoad snap don’t have any special properties defined.
Maybe, you can share a bit more about your configuration, settings, or anything else suspect.
10-27-2021 02:41 PM
Thanks Matthew. I tried dropping table and tried with Real, Double precision and even with VARCHAR(100). I am still getting same error. Here is the full error message from bulk load error view.
“reason”:
“Error occurred at line 1935, for column name annual_revenue , column value 105986489.63 , row value 105986489.63.0011B000028GUpJ.Solita Oy.324.USD.12/5/2019.10/30/2019.6/1/2020 4:48:00 AM.11/14/2019.15.Tier3.5/7/2020.10/30/2019.10/30/2019.1/7/2020 2:24:00 PM.Time Tracking.12/10/2018.11/30/2018.4/9/2020.825…28.6.9/1/2021.Q3-2020.Closed…Yes.Professional Services.Outbound Prospecting (CSD).TimeTracking.0061B00001bB6VEQA0.Solita Platform.Ville Kivinummi.Nordics ME Team.4/9/2020.11/14/2019.5/4/2020 4:48:00 AM…True.NEZA.Nordics.TimeTrackingTier3RoleLookup.TimeTrackingTier3RoleMatch.1.Emerging.0.Tier3.7-Post Prod Support.674.9- Closed/Won.TimeTrackingTier3StageLookup.TimeTrackingTier3StageMatch.1.Customer Base Nordics.Closed.2/27/2020.5/7/2020.LDPTimeTracking , with error code 1207 and error_message Invalid digit, Value ‘.’, Pos 9, Type: Integer .”
“resolution”:
“Please ensure all required values are set and the data types/values match the columns defined on the table.”
“status_code”:
“error”
“line_number”:
1935
“column_name”:
"annual_revenue "
“error_code”:
1207
“error_message”:
"Invalid digit, Value ‘.’, Pos 9, Type: Integer "
“column_value”:
"105986489.63 "
“row_value”:
"105986489.63.0011B000028GUpJ.Solita Oy.324.USD.12/5/2019.10/30/2019.6/1/2020 4:48:00 AM.11/14/2019.15.Tier3.5/7/2020.10/30/2019.10/30/2019.1/7/2020 2:24:00 PM.Time Tracking.12/10/2018.11/30/2018.4/9/2020.825…28.6.9/1/2021.Q3-2020.Closed…Yes.Professional Services.Outbound Prospecting (CSD).TimeTracking.0061B00001bB6VEQA0.Solita Platform.Ville Kivinummi.Nordics ME Team.4/9/2020.11/14/2019.5/4/2020 4:48:00 AM…True.NEZA.Nordics.TimeTrackingTier3RoleLookup.TimeTrackingTier3RoleMatch.1.Emerging.0.Tier3.7-Post Prod Support.674.9- Closed/Won.TimeTrackingTier3StageLookup.TimeTrackingTier3StageMatch.1.Customer Base Nordics.Closed.2/27/2020.5/7/2020.LDPTimeTracking "
If you get something, Please help.
Regards,
Amar.
10-27-2021 02:45 PM
What is the data type of annual_revenue? The error suggests it is an integer, and not a decimal number. Please verify for me.
10-27-2021 03:05 PM
Incoming mapper upstream has combination of numbers sample like 47577 & 595546349.04. I have 3 mapper snaps followed by bilk load. Before these 3 mapper I have csv parser snap as I am reading google sheet. In 1st & 2nd mapper input schema is string, target schema is any for both mapper. 3rd mapper input schema is string and target schema is integer. I have removed all the parseFloat() or trim() expression from my pipeline and having double precision in create table. Still I am getting same error while validation with same message shared before.
Not sure why!
Regards,
Amar.
10-27-2021 03:10 PM
The error you are receiving is from the Redshift database trying to insert a decimal into an integer field. Can you tell me the data type of the annual_revenue field ? It seems it is defined as an integer, but we’re trying to store a decimal number into field. So all the snap data transformation, etc, are relevant, but what really matters is the final value getting inserted into the target database, and it seems there is a mismatch here.