Created by @pkona
This pipeline pattern has a total of two pipelines:
Setup and HDFS RW (Standard), which contains the datasets and writes to HDFS
Hadoop file formats (Standard) that converts the CSV dataset into file formats that are commonly used for Hadoop and Big Data use cases.
File formats include:
- Parquet
- ORCFile
- Avro
- SequenceFile

Specify values for the following pipeline parameters:
- hdfs_base_uri
- hdfs_folder_path
Sources: CSV File on HDFS
Targets: Avro, Parquet, CSV, JSON, ORCFile, SequenceFile files on HDFS
Snaps used:
Diane Miller