Forum Discussion

dmiller's avatar
dmiller
Former Employee
7 years ago

Convert data into various file formats and write to HDFS

Created by @pkona


This pipeline pattern has a total of two pipelines:

  • Setup and HDFS RW (Standard), which contains the datasets and writes to HDFS
  • Hadoop file formats (Standard) that converts the CSV dataset into file formats that are commonly used for Hadoop and Big Data use cases.

File formats include:

  1. CSV
  2. JSON
  3. Parquet
  4. ORCFile
  5. Avro
  6. SequenceFile

Configuration

Specify values for the following pipeline parameters:

  • hdfs_base_uri
  • hdfs_folder_path

Sources: CSV File on HDFS
Targets: Avro, Parquet, CSV, JSON, ORCFile, SequenceFile files on HDFS
Snaps used:

Downloads

No RepliesBe the first to reply