Ingest data from Database (Oracle) into the Hadoop Data Lake (HDFS+Hive) using the AVRO as the Data format
Created by @pkona
This pattern ingests all the tables present in an Oracle database into the Hadoop Data Lake. This pattern will achieve the following:
- Automatically creates Avro schemas and Hive tables
 - Automatically shards large tables
 - Ingests data into HDFS+Hive tables
 - All the pipeline execution is done to maximize parallelized execution of the child pipelines across multiple nodes in the Snaplex
 
Pipelines in this Patter
Pattern 3 - Step 1.0 - Oracle to Hive Parent
The master pipeline :
- 
Gets a list of all source tables.
 - 
Calls a child pipeline (Pattern 3 - Step 1.1 - Ingest - Generate Hive table Avro based on Oracle Table metadata) to create schemas and tables on Hive.
 - 
Calls a child pipeline for a table (Pattern 3 - Step 1.2 - Oracle to Hive No Sharding)
 - 
The Pipeline Execute controls the number of tables to be processed in parallel.
 
Pattern 3 - Step 1.1 - Ingest - Generate Hive table Avro based on Oracle Table metadata
- Creates schemas required for file data format
 - Creates Hive table
 - Returns folder location used by Hive Table
Uses pipelines Pattern 3 - Core - Write schema to HDFS and Write Hive Table DDL to HDFS 
Pattern 3 - Step 1.2 - Oracle to Hive No Sharding
- Processes data for a single table
- If the source table is small, writes a single file.
 - If table is large, calculate shard size and call child pipeline to process a shard
 - The Pipeline Execute controls the number of shards to be processed in parallel (Pattern 3 - Step 1.2.1 Oracle to Hive - process a shard)
 
 
Pattern 3 - Step 1.2.1 Oracle to Hive - process a shard
- Processes a shard
 - Queries source database for a shard
 - Writes a file for the data
 - Each shard has its own file under the Hive table’s folder
 
Configuration
Sources: Oracle
Targets: HDFS, Hive, AVRO
Snaps used: Oracle, HDFS, Hive, JSON Generator
Downloads
Project of all Pipelines
Community Patterns_DW_ DB to Hadoop DL.zip (11.6 KB)
Individual Pipelines
Pattern 3 - Step 1.0 - Oracle to Hive Parent_2018_08_24.slp (14.7 KB)
Pattern 3 - Step 1.1 - Ingest - Generate Hive table Avro based on Oracle Table metadata_2018_08_24.slp (22.0 KB)
Pattern 3 - Step 1.2 - Oracle to Hive No Sharding_2018_08_24.slp (15.3 KB)
Pattern 3 - Step 1.2.1 Oracle to Hive - process a shard_2018_08_24.slp (9.8 KB)
Pattern - Core - Query Hive Table Metadata_2018_08_24.slp (5.1 KB)
Pattern 3 - Core - Write schema to HDFS_2018_08_24.slp (8.4 KB)