You shouldn’t have to worry about event triggering patterns unless it’s important for your business logic. Our design challenge was to preserve the fundamental simplicity of Kafka Streams. ![]() Well, now you can! Final results with the Suppress operation ![]() It would be a whole lot simpler if you could execute the callback only on the final result of the computation. You can try to suppress those intermediate states by using Streams’ and configs, but it’s going to be difficult (or impossible) to get them to line up perfectly with your windows, especially if you have two different streams with different window configs in the same application. We can send a follow-up email telling people to ignore the first message, but they’ll still receive both when they would have preferred no message at all. So we’d wind up sending emails for every key before later realizing that we shouldn’t have sent them. If you want to send an email when a key has less than four events in a window, it seems pretty straightforward…īut this program is broken! The filter is initially true for every key in every window, since the counts would progress through one, two and three before reaching four. Each time the graph refreshes, it will get the most recent metric values in each window. Let’s say you built some dashboard graphs by querying the result tables using Interactive Query. By default, Kafka Streams will update a window’s output whenever the window gets a new event (or soon afterward, depending on your record cache configuration). Use case: AlertingĬonsider building a metrics and alerting pipeline in which events are bucketed into two-minute windows. The clearest use case for Suppress at the moment is getting final results, so we’ll start by looking at the feature from that perspective. Or, especially valuable for non-retractable outputs like alerts, you can use it to get only the final results of a windowed aggregation. You can use Suppress to effectively rate-limit just one KTable output or callback. Since it’s an operator, you can use it to control the flow of updates in just the parts of your application that need it, leaving the majority of your application to be governed by the unified record cache configuration. Suppress is an optional DSL operator that offers strong guarantees about when exactly it forwards KTable updates downstream. Kafka Streams now supports these use cases by adding Suppress. Anyone who has accidentally hit “send” too early or been woken up in the middle of the night by a flaky alert knows that sending a follow-up message is not exactly the same as a continuous update. Sending emails, firing off alerts or even committing resources, like telling a driver to start a delivery run, are all examples of side effects. Some of these external systems are designed in a way that don’t support continuous refinement, and it would be nice to support developers who have to work with these systems.Īlso, there are plenty of use cases that are not pure data processing but need to produce side effects. However, many data processing applications will eventually produce results into a system outside of Apache Kafka ®. It eliminates irrelevant details from applications while providing complete semantics. The continuous-refinement-with-operational-parameters model is a very simple and powerful design for a data processing system. ![]() Because this is an operational problem, not a semantic one, Streams gives you operational parameters to tune a record cache rather than API operators. High-volume applications may not be able to process and transmit every update within the constraints of CPU, memory, network and disk. ![]() It is important for the operational characteristics, though. Whether Streams emits every single update or groups updates is irrelevant to the semantics of a data processing application. You get to focus on the logic of your data processing pipeline. You don’t have to worry about telling each operator when it should emit results. This article explains how we are fundamentally sticking with this model, while also opening the door for use cases that are incompatible with continuous refinement.īy continuous refinement, I mean that Kafka Streams emits new results whenever records are updated. Back in May 2017, we laid out why we believe that Kafka Streams is better off without a concept of watermarks or triggers, and instead opts for a continuous refinement model.
0 Comments
Leave a Reply. |