08-24-2018 10:43 AM
Created by @pkona
This pattern ingests all the tables present in an Oracle database into the Hadoop Data Lake. This pattern will achieve the following:
The master pipeline :
Gets a list of all source tables.
Calls a child pipeline (Pattern 3 - Step 1.1 - Ingest - Generate Hive table Avro based on Oracle Table metadata) to create schemas and tables on Hive.
Calls a child pipeline for a table (Pattern 3 - Step 1.2 - Oracle to Hive No Sharding)
The Pipeline Execute controls the number of tables to be processed in parallel.
Sources: Oracle
Targets: HDFS, Hive, AVRO
Snaps used: Oracle, HDFS, Hive, JSON Generator
Community Patterns_DW_ DB to Hadoop DL.zip (11.6 KB)
Pattern 3 - Step 1.0 - Oracle to Hive Parent_2018_08_24.slp (14.7 KB)
Pattern 3 - Step 1.1 - Ingest - Generate Hive table Avro based on Oracle Table metadata_2018_08_24.s... (22.0 KB)
Pattern 3 - Step 1.2 - Oracle to Hive No Sharding_2018_08_24.slp (15.3 KB)
Pattern 3 - Step 1.2.1 Oracle to Hive - process a shard_2018_08_24.slp (9.8 KB)
Pattern - Core - Query Hive Table Metadata_2018_08_24.slp (5.1 KB)
Pattern 3 - Core - Write schema to HDFS_2018_08_24.slp (8.4 KB)
06-02-2020 07:08 AM
Hi Prasad Kona, this is a great pattern.
To me this is the initial set up to get all your data (on the run time) into your target environment.
Now, how would you ensure that your data gets constantly updated (missing records added, data altered in Oracle DB etc.)
This is the part I am struggling with, keep the data sync from Oracle to Hadoop.
Thanks,
06-02-2020 07:42 AM
Prasad is no longer in this community. Perhaps someone else who has used this pattern can provide further insight.