Forum Discussion

New Contributor II

4 years ago

Solved

Kafka consumer Skip messages when fail

Hi , We pull the data from Kafka and put it and the database, but we realized the Kafka consumer skips the data/offsets if the pipeline fails. for example: in a run Kafka consumer is supposed to ...

confluent_kafka

kafka

ptaylor
4 years ago

ptaylor:

In your case, you’ll probably also need to change the Output Mode to One output document per batch and then use the Pipeline Execute snap to process each batch in a child pipeline. You would put the Snowflake bulk loading snap in the child pipeline; each execution of the child pipeline would process one batch of records received from Kafka.

Reliable, High-Throughput Batching with the Kafka Consumer Snap Knowledgebase

In this article just published on Medium, we take a closer look at the (Confluent) Kafka Consumer Snap’s new Output Mode setting, and how it can be used to achieve reliable, high-throughput performance for some common use cases where it’s important to process records in batches. [OutputMode-OnePerBatch] Here are the release notes for the 423patches7900 version where this feature was introduced.

neeraj_sharma

New Contributor II

4 years ago

Hi @ptaylor,

Appreciate you in depth response.

I agree I should use “Snowflake bulk Load” snap with Kafka Consumer`s Batch Mode option, I usually Validate pipeline and then I select the $variable this ensures correctness, but the pipeline did not work last time, I re-validated and it start to working.

So then I replace snowflake insert to Snowflake bulk insert, but then again it started to timeout

So, I thought Bulk insert might work with “Kafka Consumer`s” Batch acknowledge property but it does not work, it gives Time Out.

I think I am missing something to add, can you please see the properties what I am missing here and please add pipeline as well if you are trying it out.

Attaching all 3 pipeline, I added Extra mappers etc for debug purpose only.

Bulk insert with Batch not Working Pipeline_2022_02_22.slp (17.3 KB)
Bulk insert not Working Pipeline_2022_02_22.slp (12.1 KB)
Working Pipiline_2022_02_22.slp (11.4 KB)

Bulk insert not Working Pipeline_2022_02_22.slp12 KB

Bulk insert with Batch not Working Pipeline_2022_02_22.slp17 KB

Working Pipiline_2022_02_22.slp11 KB

ptaylor

Employee

4 years ago

neeraj_sharma
New Contributor II
4 years ago
Yes, with pipeline Execute snap, it started to work and perform very well.

Just one follow up question earlier we wanted to pull the data from different Kafka Topics and joining all together in one pipeline and inserting in one table, now looks like it’s not possible, now we need to create different pipelines for all topics, need to push the data into the tables and then create one more pipeline to join all data and put into the final table?
- ptaylor
  Employee
  4 years ago
  Ok, I’m glad that you were able to get it working well with a single topic.
  
  Performing joins on streaming data in real time is a very advanced subject. To discuss it in detail would require much more information about your use cases and is not a discussion I can really get into here in this forum. I would consider whether it might make sense to read the data into separate Snowflake tables and then use Snowflake to do the joins. If you need true streaming functionality like windowed joins then you might look at KsqlDB or Kafka Streams. It might be possible to do the joins in SnapLogic pipelines but that can get very tricky with real-time streams that don’t end, as our Join is designed for finite input streams. One thing to consider is a hybrid approach where you use KsqlDB to do the joins of the separate Kafka topics, which will produce a new topic containing the joined data. Then use our Kafka Consumer snap to read that topic and insert into Snowflake.

Forum Discussion

Kafka consumer Skip messages when fail

Recent Discussions

Pagination Logic Fails After Migrating from REST GET to HTTP Client Snap

Pipeline Execute Pool size

trace API and proxy calls

Concat values of a field based on value of another field

JWT Configuration for SnapLogic Public API