cancel
Showing results for 
Search instead for 
Did you mean: 

Convert data into various file formats and write to HDFS

dmiller
Former Employee

Created by @pkona


This pipeline pattern has a total of two pipelines:

  • Setup and HDFS RW (Standard), which contains the datasets and writes to HDFS
  • Hadoop file formats (Standard) that converts the CSV dataset into file formats that are commonly used for Hadoop and Big Data use cases.

File formats include:

  1. CSV
  2. JSON
  3. Parquet
  4. ORCFile
  5. Avro
  6. SequenceFile

Screenshot%20at%20Jul%2010%2013-30-56-write-hdfs-format

Configuration

Specify values for the following pipeline parameters:

  • hdfs_base_uri
  • hdfs_folder_path

Sources: CSV File on HDFS
Targets: Avro, Parquet, CSV, JSON, ORCFile, SequenceFile files on HDFS
Snaps used:

Downloads


Diane Miller
0 REPLIES 0