Kafka consumer Skip messages when fail

ptaylor
Employee
4 years ago
The Acknowledge is failing because the metadata is present in the input document, but it’s not in the default location under the document root ($metadata) because of how the Join combines and restructures the data from its inputs. Try validating the pipeline, then preview the Join’s output to note where the full metadata is located within the document. Then open the Acknowledge snap, click the suggest button for the Metadata Path setting, and select the location of the metadata.

Also, note the advice in the error message about holding the Shift key when you click the Validate icon. That will force all snaps to run instead of relying on cached preview data from previous validations for snaps that you haven’t edited. This is important for the way the Consumer and Acknowledge snaps interact.

As for performance, the bottleneck in your pipeline is the fact that you’re inserting one record at a time into Snowflake. You’ll have far better performance with data warehouses like Snowflake if you do bulk loading (inserting many records in one operation). Frankly, I’m not really familiar with our Snowflake snaps, but I think Bulk Load or Bulk Upsert are better suited for your use case. Check our documentation for those snaps and if you still have questions, ask them here in the Community in a new post.

However, right now your Kafka Consumer snap is configured with Acknowledge Mode = Wait after each record, which means the the Consumer will output a single document, then wait for the Acknowledge snap to ack that document before it outputs the next record. Obviously that’s incompatible with the requirements of a bulk loading snap. (You also have Message Count set to 1, but I’m guessing that was for debugging purposes and you’ll set it back to the default, -1.)

Fortunately, the Kafka Consumer snap has a lot of flexibility to deal with such scenarios. At a minimum, you’ll need to change Acknowledge Mode to Wait after each batch of records. This lets the Consumer output many records at a time, then wait for all of those records to be acknowledged before asking the Kafka broker for more records to process. In your case, you’ll probably also need to change the Output Mode to One output document per batch and then use the Pipeline Execute snap to process each batch in a child pipeline. You would put the Snowflake bulk loading snap in the child pipeline; each execution of the child pipeline would process one batch of records received from Kafka. That will vastly improve your performance.

You can find an article I wrote about this to get a much better idea of how this works here:

Reliable, High-Throughput Batching with the Kafka Consumer Snap Knowledgebase

In this article just published on Medium, we take a closer look at the (Confluent) Kafka Consumer Snap’s new Output Mode setting, and how it can be used to achieve reliable, high-throughput performance for some common use cases where it’s important to process records in batches. [OutputMode-OnePerBatch] Here are the release notes for the 423patches7900 version where this feature was introduced.

Hope this helps.

Forum Discussion

Recent Discussions

Way to lock down in Prod org to "Monitor" only access?

trace API and proxy calls

Pagination Logic Fails After Migrating from REST GET to HTTP Client Snap

Pipeline Execute Pool size

Concat values of a field based on value of another field