Determine changes between two data files

Created by @pkona

This pipeline pattern determines changes between two data files (inserts, updates, deletes, no-changes, older-records-for-updates).

This is a standard mode pipeline pattern that illustrates how to find and separate changes like inserts, updates, deletes, no-changes, older-records-for-updates into separate files.

This pattern is best suited to handle small to medium files (< 50 million records). For larger files > 5GB or 100 Million records, consider using Spark mode pipelines for optimal performance.

Configuration

Velocity templates help generate code, schemas, and text.

Sources: Files
Targets: Files
Snaps used: CSV Generator, JSON Generator, Head, CSV Formatter, CSV Parser, Mapper, Copy, Script, JSON Formatter, File Writer

Downloads

Generate_csvSchema4Spark.slp (26.7 KB)

0 REPLIES 0

never-displayed

You must be signed in to add attachments

never-displayed

SnapLogic - Integration Nation