dmiller
7 years agoFormer Employee
Convert data into various file formats and write to HDFS
Created by @pkona
This pipeline pattern has a total of two pipelines:
- Setup and HDFS RW (Standard), which contains the datasets and writes to HDFS
- Hadoop file formats (Standard) that converts the CSV dataset into file formats that are commonly used for Hadoop and Big Data use cases.
File formats include:
- CSV
- JSON
- Parquet
- ORCFile
- Avro
- SequenceFile
Configuration
Specify values for the following pipeline parameters:
- hdfs_base_uri
- hdfs_folder_path
Sources: CSV File on HDFS
Targets: Avro, Parquet, CSV, JSON, ORCFile, SequenceFile files on HDFS
Snaps used: