Change Data Capture With SnapLogic

I frequently get asked by customers - does SnapLogic support Change Data Capture (CDC)? My response usually is - what does CDC mean to you? What are you looking to do?

While there are a few organizations that are looking to do CDC in the traditional sense (log scraping etc) many of these customers are looking to optimize data movement by only moving records that have changed since the last move.

Especially in the context of SFDC, Workday or other SAAS applications, CDC in the traditional sense implemented via techniques like log scraping makes limited sense.

As a result, SnapLogic’s approach outlined below becomes especially relevant in these contexts.

Our approach is to enable Query based CDC, that assumes the existence of a ‘last updated’ timestamp field in the source data that we can compare against.

Below is an example of query based CDC implemented within a SnapLogic pipeline, that extracts Account objects from Salesforce that have changed since the last move, and loads into SQL Server. The subsequent pipeline implements the reverse data movement - from SQL Server to Salesforce.

Part 1: Synchronize Salesforce to SQL Server:

This pipeline maintains the last read timestamp in a file on SLDB. The pipeline flow starts by reading the timestamp along the top path of the pipeline, and running a few sanity checks on the timestamp (data type, null value etc).

It then makes a copy of the timestamp and passes into SFDC account read (to be passed through downstream with the results of the read).

Along the top path, the pipeline then captures the current timestamp, formats and writes back to the timestamp file as the new most recent CDC time.

Along the bottom path, the timestamp is compared with a last_modified field in the SFDC account object which captures the last time when a given object changed. If the last_modified value is greater than the value of our timestamp, we keep the record, else discard it.

At the output of the filter, we now have a list of only the records that have changed since the last time we moved data from SFDC to SQL Server.

Part 2: Synchronize SQL Server to Salesforce

The reverse pipeline that implements timestamp based CDC from SQL Server to salesforce is shown below. The logic here is identical to before.

These pipelines can be run either in a periodic polling mode, or can be triggered based on some event in the source system.

Pipelines explained in this topic:
Sync SS-SF w Timestamp_2016_07_15 (1).slp (16.0 KB)

Sync SF-SS w Timestamp_2016_07_15.slp (15.8 KB)

1 Like

Concept is good but how can we make sure to update timestamp only when bottom path is successful?

@Piyush, I usually do something similar to this: I add an Error output to the bottom flow (Load SFDC) and add an Exit Snap to that Error output. As well, I break the top flow to link to the Union Snap that is linked to the Exit Snap. I also add notification logic (a reusable, nested Pipeline) to the Error output, but that’s not required to address your question.

The Exit snap will not produce a document if there is not an error, so the top flow will continue through the Union to the Update Timestamp without interference. Otherwise, if there is an error on the Load SFDC snap, the “notification logic” will send notification and the Exit snap will halt the pipeline before the Union can proceed to the Update Timestamp Snap…

Replying to my own post… You need to test… It seems like I had one case where I needed to add a Sort at the end of the Union to make sure all processing prior to the Union completed to allow the Exit to work before moving forward, but I can’t think or find that case right now.