Reference Implementation - Integration Pattern - On-Prem Database to Cloud Datawarehouse

This is a common integration patterns that customers tend to use as they are moving from on-prem to cloud.

The following videos describe this integration pattern and the attached reference implementation pipelines help you get a head start on the implementation.

Part 1

Part 2

Pipelines Attached

The pipeline moves all the tables from an Oracle database to Redshift. The pipelines are designed to use our best practices like parameterization of all aspects of the pipeline (eg. account names , db names, table names). All are dynamic and controlled via pipeline parameters.

Source: Oracle Tables
Target: Redshift Tables
Snaps used:

  • Step 1: Oracle - Table List, Filter, Oracle - Execute, Mapper, Pipeline Execute
  • Step 2: Mapper, Redshift - Execute, Router, Join, Oracle - Select, Shard Offsets*, Redshift - Bulk Load, Pipeline Execute, Exit, Union
  • Step 3: Mapper, Oracle - Select, Redshift - Bulk Load, Exit

*This is a Snap developed by the SnapLogic Professional Services team.

Downloads

Pattern 1 - Step 1.0 - Oracle to Redshift Parent_2018_06_25.slp (9.4 KB)
Pattern 1 - Step 1.1 - Oracle to Redshift No Shredding_2018_06_25.slp (20.9 KB)
Pattern 1 - Step 1.2 - Oracle to Redshift Shred_2018_06_25.slp (7.7 KB)

1 Like

Like your pattern for the sharding, I have had to implement something similar. Your example pipelines calls the shard table snap. Non existent in my environment - wondering if this is generally available or is it custom?

Thanks

Shard offsets was created by our Professional Services team. Talk with you account manager for information about availability.

1 Like

I’m trying to make this pattern work for MSSQL to Snowflake. Do you have a working implementation to Snowflake. I cannot seem to get the bulk load to work from the MSSQL->Select snap to the Snowflake Bulk Loader. Is there functionality in the Oracle bulk loader that this needed for this pattern?

Lastly, I can make this work if I have a specific mapper and bulk loader for each table but that seems like a lot of redundant work (100+ tables worth).

Let me know.

The oracle extract uses the limit offset to help with shrading the data. If you plan to ignore shrading the data then you can easily convert the existing pattern to support MSSQL and target as Snowflake… Set shrading as no in the pipeline parameters on the master pipeline.

Here is an example of a pattern that might also help that gets files on s3 and insert into snowflake .

Shading is not the issue.

The issue is that there are data type mismatches between MSSQL and Snowflake. Which implies we need a mapper. But mappers are not generic so the pattern breaks down.

Try the pattern with MS and Snowflake. Also use Bit fields. I suspect date field I miss you as well.

Thanks,

Sonny

“This communication (including all attachments) is intended solely for the use of the person(s) to whom it is addressed and should be treated as a confidential AAA communication. If you are not the intended recipient, any use, distribution, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately delete it from your system and notify the originator. Your cooperation is appreciated.”

AAA Carolinas

here is a DB datatype cross reference I build which could be handy. Have you already done some experiment or are you suspecting it will be an issue?

Yes there is a data type issue. I’ve seen it with BIT vs Boolean and I made that work. But I also suspect that some others have a similar issue (datetime , timestamp ntz, etc).

I’m not clear on why the reference architecture works.

  • Does Redshift bulk loader have some other functionality that the Snowflake bulk loader needs?
  • Did the data types not have any issues like BIT vs Boolean, etc.

Thanks for the input. I’m not really concerned about sharding (at the moment) but it’s the bulk loader and data types that concern me. The reference architecture connects the bulk loader directly to the MSSQL Select but the data types won’t convert properly (BIT to BOOLEAN, etc)

Can you explain why it work in Oracle to Redshift. Maybe that will help me fix /understand the MSSQL to Snowflake approach.