Insert error with JDBC insert and PGSQL

I have a pipeline that reads from Oracle and writes to PGSQL. When inserting to PGSQL, I get this error

Batch entry 0 insert into “pg_temp”.“crmi002b_ut_pprs001” (“end_date”, “opportunity_type”, “sales_person”, “opportunity_status_type”, “sr_partition_id”, “program_id”, “promotion_id”, “contact_id”, “created_by”, “last_updated_by”, “sr_checksum”, “opportunity_id”, “sr_modification_time”, “entity_code”, “created_on”, “last_updated_on”, “sr_creation_time”, “start_date”) values (cast(NULL as varchar), ‘LOV_OT_INTERACTION’, ‘C_NACAR’, ‘LOV_OS_INTERACTION’, ‘22’::numeric, cast(NULL as varchar), cast(NULL as varchar), ‘151000000’::numeric, ‘C_NACAR’, ‘C_NACAR’, ‘f1503e2c1ad94f56d9deef140da28ead’, ‘122000000’::numeric, ‘1569965343729’::numeric, cast(NULL as varchar), cast(‘2010-12-11 10:51:24-06’ as timestamp), cast(‘2010-12-11 10:51:24-06’ as timestamp), ‘1569965343729’::numeric, cast(NULL as varchar)) was aborted: ERROR: column “end_date” is of type timestamp without time zone but expression is of type character varying
Hint: You will need to rewrite or cast the expression.
Position: 364 Call getNextException to see other errors in the batch., error code: 0, SQL state: 42804

I am unsure why the snap is trying to cast NULLs as varchars for date and int fields. I assume it is because it does not see a good first example with data, so it translates properly as is done in the other fields that have data

Is there a way to bypass this casting, have this fixed? This is not the only pipeline that runs from Oracle to PGSQL, and not the first one that has null values in non-test fields. I find it hard to believe that in all other pipelines like this one the first record always have data for all fields

Unsure if this is the reason. We switched schemas from pg_temp to a regular schema, and it works fine. This is the first time we are trying to use pg_temp and it does not work with snap. Is snap trying to get the table definitions through a different connection? It would not see the table in the case if pg_temp

It’s… complicated. You should generally assume that different snaps use different connections. In rare situations it looks like a connection is passed from one snap to the next but there should be a reset between snaps. E.g., with postgresql we reset the default schemas.

Within an individual snap it’s… complicated. There’s usually a single connection used during both validation and execution. However if the snap is at the end of a pipeline that has a long delay between launch and the first documents hitting the snap it’s possible that they’ll use a different connection. It’s theoretically possible that a connection will be changed execution but I don’t think that happens in practice. We shouldn’t do anything during validation that’s required during execution but something could have slipped through.

Could you write up two tickets? One with enough details about this pipeline for us to try to duplicate it, and a more general one asking for support for pg_temp? We can check a few things in the current snaps but full support for pg_temp (e.g., guaranteeing the same connection in all snaps) will require a lot of research into its feasibility.

@cesar.bolanos did you try defining TIMESTAMP without time zone?

This also happens with int fields

Weird! Open up a SL support ticket.

Also you can try using generic JDBC snap and configure the appropriate drivers based on postgres version you are using, which should work but might be low performing

Could you submit a ticket for that? It should ideally contain the table’s schema, a sample input document, and any errors that show the generated sql.

For background we perform a database metadata query to learn what the database expects or provides and have a few dozen classes that convert between the database types and Document field types. In theory it should be easy to make the correct choice but even in the standard field types there are subtle differences, sometimes even between different major versions of a single product, that can cause problems with the conversions.

Off the top of my head the only reason we would try to use a varchar instead of a date or int is if the database metadata is reporting an unrecognized field type. That might be due to a new field type in a recently released version, a locally installed extension (e.g., postgis) that defines its own types, or simply an oversight. In some of those cases we can add that value to the internal lookup tables in the next release but we need to know what we need to add.