Created by @pkona
This pipeline pattern has a total of two pipelines:
-
Setup and HDFS RW (Standard), which contains the datasets and writes to HDFS
-
Hadoop file formats (Standard) that converts the CSV dataset into file formats that are commonly used for Hadoop and Big Data use cases.
File formats include:
- CSV
- JSON
- Parquet
- ORCFile
- Avro
- SequenceFile
Configuration
Specify values for the following pipeline parameters:
- hdfs_base_uri
- hdfs_folder_path
Sources: CSV File on HDFS
Targets: Avro, Parquet, CSV, JSON, ORCFile, SequenceFile files on HDFS
Snaps used:
Downloads
Diane Miller