I frequently get asked by customers - does SnapLogic support Change Data Capture (CDC)? My response usually is - what does CDC mean to you? What are you looking to do?
While there are a few organizations that are looking to do CDC in the traditional sense (log scraping etc) many of these customers are looking to optimize data movement by only moving records that have changed since the last move.
Especially in the context of SFDC, Workday or other SAAS applications, CDC in the traditional sense implemented via techniques like log scraping makes limited sense.
As a result, SnapLogic’s approach outlined below becomes especially relevant in these contexts.
Our approach is to enable Query based CDC, that assumes the existence of a ‘last updated’ timestamp field in the source data that we can compare against.
Below is an example of query based CDC implemented within a SnapLogic pipeline, that extracts Account objects from Salesforce that have changed since the last move, and loads into SQL Server. The subsequent pipeline implements the reverse data movement - from SQL Server to Salesforce.
Part 1: Synchronize Salesforce to SQL Server:
This pipeline maintains the last read timestamp in a file on SLDB. The pipeline flow starts by reading the timestamp along the top path of the pipeline, and running a few sanity checks on the timestamp (data type, null value etc).
It then makes a copy of the timestamp and passes into SFDC account read (to be passed through downstream with the results of the read).
Along the top path, the pipeline then captures the current timestamp, formats and writes back to the timestamp file as the new most recent CDC time.
Along the bottom path, the timestamp is compared with a last_modified field in the SFDC account object which captures the last time when a given object changed. If the last_modified value is greater than the value of our timestamp, we keep the record, else discard it.
At the output of the filter, we now have a list of only the records that have changed since the last time we moved data from SFDC to SQL Server.
Part 2: Synchronize SQL Server to Salesforce
The reverse pipeline that implements timestamp based CDC from SQL Server to salesforce is shown below. The logic here is identical to before.
These pipelines can be run either in a periodic polling mode, or can be triggered based on some event in the source system.
Pipelines explained in this topic:
Sync SS-SF w Timestamp_2016_07_15 (1).slp (16.0 KB)
Sync SF-SS w Timestamp_2016_07_15.slp (15.8 KB)