Determine changes between two data files
Created by @pkona This pipeline pattern determines changes between two data files (inserts, updates, deletes, no-changes, older-records-for-updates). This is a standard mode pipeline pattern that illustrates how to find and separate changes like inserts, updates, deletes, no-changes, older-records-for-updates into separate files. This pattern is best suited to handle small to medium files (< 50 million records). For larger files > 5GB or 100 Million records, consider using Spark mode pipelines for optimal performance. Configuration Velocity templates help generate code, schemas, and text. Sources: Files Targets: Files Snaps used: CSV Generator, JSON Generator, Head, CSV Formatter, CSV Parser, Mapper, Copy, Script, JSON Formatter, File Writer Downloads Generate_csvSchema4Spark.slp (26.7 KB)2.5KViews0likes0Comments