cancel
Showing results for 
Search instead for 
Did you mean: 

Determine changes between two data files

pkona
Former Employee

Created by @pkona


This pipeline pattern determines changes between two data files (inserts, updates, deletes, no-changes, older-records-for-updates).

This is a standard mode pipeline pattern that illustrates how to find and separate changes like inserts, updates, deletes, no-changes, older-records-for-updates into separate files.

This pattern is best suited to handle small to medium files (< 50 million records). For larger files > 5GB or 100 Million records, consider using Spark mode pipelines for optimal performance.

gen-spark-csv-schema
spark-csv-schema

Configuration

Velocity templates help generate code, schemas, and text.

Sources: Files
Targets: Files
Snaps used: CSV Generator, JSON Generator, Head, CSV Formatter, CSV Parser, Mapper, Copy, Script, JSON Formatter, File Writer

Downloads

Generate_csvSchema4Spark.slp (26.7 KB)

0 REPLIES 0