Ingest data from Database (Oracle) into the Hadoop Data Lake (HDFS+Hive) using the AVRO as the Data format
Created by @pkona This pattern ingests all the tables present in an Oracle database into the Hadoop Data Lake. This pattern will achieve the following: Automatically creates Avro schemas and Hive tables Automatically shards large tables Ingests data into HDFS+Hive tables All the pipeline execution is done to maximize parallelized execution of the child pipelines across multiple nodes in the Snaplex Pipelines in this Patter Pattern 3 - Step 1.0 - Oracle to Hive Parent The master pipeline : Gets a list of all source tables. Calls a child pipeline (Pattern 3 - Step 1.1 - Ingest - Generate Hive table Avro based on Oracle Table metadata) to create schemas and tables on Hive. Calls a child pipeline for a table (Pattern 3 - Step 1.2 - Oracle to Hive No Sharding) The Pipeline Execute controls the number of tables to be processed in parallel. Pattern 3 - Step 1.1 - Ingest - Generate Hive table Avro based on Oracle Table metadata Creates schemas required for file data format Creates Hive table Returns folder location used by Hive Table Uses pipelines Pattern 3 - Core - Write schema to HDFS and Write Hive Table DDL to HDFS Pattern 3 - Step 1.2 - Oracle to Hive No Sharding Processes data for a single table If the source table is small, writes a single file. If table is large, calculate shard size and call child pipeline to process a shard The Pipeline Execute controls the number of shards to be processed in parallel (Pattern 3 - Step 1.2.1 Oracle to Hive - process a shard) Pattern 3 - Step 1.2.1 Oracle to Hive - process a shard Processes a shard Queries source database for a shard Writes a file for the data Each shard has its own file under the Hive table’s folder Configuration Sources: Oracle Targets: HDFS, Hive, AVRO Snaps used: Oracle, HDFS, Hive, JSON Generator Downloads Project of all Pipelines Community Patterns_DW_ DB to Hadoop DL.zip (11.6 KB) Individual Pipelines Pattern 3 - Step 1.0 - Oracle to Hive Parent_2018_08_24.slp (14.7 KB) Pattern 3 - Step 1.1 - Ingest - Generate Hive table Avro based on Oracle Table metadata_2018_08_24.slp (22.0 KB) Pattern 3 - Step 1.2 - Oracle to Hive No Sharding_2018_08_24.slp (15.3 KB) Pattern 3 - Step 1.2.1 Oracle to Hive - process a shard_2018_08_24.slp (9.8 KB) Pattern - Core - Query Hive Table Metadata_2018_08_24.slp (5.1 KB) Pattern 3 - Core - Write schema to HDFS_2018_08_24.slp (8.4 KB)5.2KViews1like2Comments- 2KViews0likes0Comments