Kafka Streams Patterns: Joins, Windows, Aggregations & Routing

The Problem: Why Stream Processing Needs Proper Patterns

Modern applications generate events at unprecedented scale. A typical e-commerce platform might process millions of events per hour - user clicks, inventory updates, payment transactions, shipment notifications. The challenge is not just handling volume; it is extracting meaning from these events in real-time.

Data Correlation - Events from different sources need to be joined to create complete pictures (orders with payments, users with their actions)
Time-Based Analysis - Business questions often involve time windows: "How many orders in the last 5 minutes?" or "What is the rolling average transaction value?"
Stateful Computations - Aggregations require maintaining state across events, which becomes complex in distributed systems
Decoupled Processing - Teams want to deploy business logic independently without rewriting infrastructure code

Raw Kafka consumer APIs provide the building blocks, but implementing these patterns correctly requires understanding distributed state management, exactly-once semantics, and fault tolerance. This is where Kafka Streams and Spring Cloud Stream shine.

The Solution: Four Essential Stream Processing Patterns

We built four focused implementations that address the core patterns every stream processing system needs. Each project demonstrates a specific pattern while sharing a common foundation: Spring Cloud Stream with the Kafka binder.

Kafka Streams Pattern Portfolio

1

Joins

Correlate events from multiple streams in real-time

+

2

Windows

Analyze events within time boundaries

+

3

Aggregations

Compute running totals and statistics

+

4

Routing

Dynamic dispatch to serverless functions

Pattern 1: Stream Joins - Correlating Related Events

The Use Case

Consider an order processing system. Order events arrive on one topic, payment confirmations on another. To build a complete order view, you need to join these streams based on a common key (order ID). This is not a database join - both events are in motion, arriving at unpredictable times.

How Spring Cloud Stream Handles It

The kafka-stream-join project demonstrates stateful stream processing where Kafka Streams maintains a local state store to hold events from one stream while waiting for matching events from another.

Spring Cloud Stream Join Configuration

@Bean
public BiFunction<KStream<String, Order>, KStream<String, Payment>,
                  KStream<String, EnrichedOrder>> processOrderPayment() {
    return (orders, payments) -> orders
        .join(payments,
            (order, payment) -> new EnrichedOrder(order, payment),
            JoinWindows.of(Duration.ofMinutes(5)),
            StreamJoined.with(Serdes.String(), orderSerde, paymentSerde))
        .peek((key, value) -> log.info("Joined: {}", value));
}

Join Types Explained

Join Type	Behavior	Use Case
Inner Join	Emits only when both streams have matching keys	Orders that have been paid
Left Join	Emits for every left event, null if no match	All orders, with payment if available
Outer Join	Emits for both streams, nulls where no match	Complete event audit trail

The critical parameter is the join window - how long to wait for a matching event. Set it too short, and you miss legitimate matches. Set it too long, and state stores grow unbounded. In production, we typically use 5-15 minute windows based on business SLAs.

Pattern 2: Windowing - Time-Bounded Analysis

The Use Case

A shipment tracking system needs to count how many packages are being processed per minute. But "per minute" can mean different things: strict minute boundaries (tumbling windows), overlapping intervals (hopping windows), or activity-based grouping (session windows).

Rolling vs. Hopping Windows

The kafka-stream-window project implements both patterns with configurable parameters:

Window Configuration

# Tumbling Window (non-overlapping)
# Window closes every 60 seconds, counts reset
--spring.cloud.stream.kafka.streams.timeWindow.length=60000

# Hopping Window (overlapping)
# Window length: 60s, advances every 40s
# Creates overlapping windows for smoother aggregations
--spring.cloud.stream.kafka.streams.timeWindow.length=60000
--spring.cloud.stream.kafka.streams.timeWindow.advanceBy=40000

Window Types Visualization

Tumbling Window

|--W1--|--W2--|--W3--|

Fixed, non-overlapping intervals

vs

Hopping Window

|----W1----|

|----W2----|

|----W3----|

Overlapping for smoother trends

Shipment Tracking with Windows

@Bean
public Function<KStream<String, Shipment>, KStream<Windowed<String>, Long>>
    trackShipments() {
    return shipments -> shipments
        .groupByKey()
        .windowedBy(TimeWindows.of(Duration.ofMinutes(1))
            .advanceBy(Duration.ofSeconds(40)))
        .count(Materialized.as("shipment-counts"))
        .toStream()
        .peek((window, count) ->
            log.info("Window {} - {} shipments",
                window.window().startTime(), count));
}

Hopping windows are particularly useful for dashboards where you want smooth trend lines rather than jagged minute-by-minute counts. The overlap means each event contributes to multiple windows, providing a moving average effect.

Pattern 3: Aggregations - Computing Running Statistics

The Use Case

Real-time analytics often requires computing aggregates: total revenue, average order value, count of events by category. Unlike batch processing, stream aggregations must update incrementally as each event arrives.

Stateful Stream Processing

The kafka-stream-aggregation project demonstrates how Kafka Streams maintains local state stores (backed by RocksDB) to enable fault-tolerant aggregations:

Real-Time Aggregation Implementation

@Bean
public Function<KStream<String, Transaction>, KTable<String, TransactionStats>>
    aggregateTransactions() {
    return transactions -> transactions
        .groupBy((key, txn) -> txn.getCategory())
        .aggregate(
            TransactionStats::new,  // Initializer
            (key, txn, stats) -> {  // Aggregator
                stats.incrementCount();
                stats.addAmount(txn.getAmount());
                stats.updateAverage();
                return stats;
            },
            Materialized.<String, TransactionStats, KeyValueStore<Bytes, byte[]>>
                as("transaction-stats-store")
                .withKeySerde(Serdes.String())
                .withValueSerde(transactionStatsSerde));
}

State Store Architecture

Component	Purpose	Fault Tolerance
RocksDB	Local state storage	Persistent across restarts
Changelog Topic	State replication	Enables state recovery
Standby Replicas	Warm standby instances	Fast failover

The beauty of this approach is that aggregations are automatically distributed across partitions. If you have 10 partitions and 5 application instances, each instance handles 2 partitions worth of aggregation state, enabling horizontal scaling.

Pattern 4: Serverless Routing - Dynamic Function Dispatch

The Use Case

Modern architectures often separate event ingestion from processing logic. Teams want to deploy business functions independently without modifying the Kafka consumer infrastructure. This is the "bring your function to the cluster" paradigm.

The Router Architecture

The kafka-streams-router project bridges Kafka topics with OpenFaaS serverless functions. Functions register their interest in topics via annotations, and the router handles message delivery:

Function Registration (stack.yml)

functions:
  order-processor:
    lang: java11
    handler: ./order-processor
    image: registry/order-processor:latest
    annotations:
      topic: orders-topic

  payment-handler:
    lang: java11
    handler: ./payment-handler
    image: registry/payment-handler:latest
    annotations:
      topic: payments-topic

Serverless Routing Flow

1

Kafka Topic

Events arrive on topic

->

2

Stream Router

Reads @topic annotations

->

3

OpenFaaS Gateway

Invokes matching functions

->

4

Business Logic

Function processes event

Zero Kubernetes YAML

The project uses jkube for Kubernetes deployment, eliminating manual YAML configuration:

Deployment Command

# Build, push, and deploy in one command
mvn k8s:build k8s:push k8s:resource k8s:apply -Pkubernetes

# Test the router
curl "http://localhost:8081/router?message=test-event"

# Response shows routing result
{
  "message": "test-event",
  "topic": "orders-topic",
  "functions": ["order-processor"]
}

Technology Foundation

Layer	Technology	Purpose
Framework	Spring Cloud Stream	Messaging abstraction with binder architecture
Stream Processing	Kafka Streams	Stateful transformations with exactly-once semantics
Message Broker	Apache Kafka	Distributed event log with partitioning
Serverless	OpenFaaS	Function deployment and scaling
Orchestration	Kubernetes (k3d)	Container orchestration and service discovery
Kafka on K8s	Strimzi	Kubernetes operator for Kafka clusters
Build & Deploy	jkube	Zero-YAML Kubernetes deployment from Maven

Why Spring Cloud Stream?

Spring Cloud Stream provides a programming model that decouples business logic from messaging infrastructure. The same code can run against Kafka, RabbitMQ, or other binders by changing configuration - no code changes required.

Binder-Agnostic Processing

# Switch from Kafka to RabbitMQ by changing the binder
spring:
  cloud:
    stream:
      bindings:
        processInput-in-0:
          destination: input-topic
          binder: kafka  # Change to 'rabbit' for RabbitMQ
        processOutput-out-0:
          destination: output-topic
          binder: kafka

Getting Started

All four projects follow a consistent setup pattern:

Standard Execution Flow

# 1. Start Kafka infrastructure
docker-compose up -d

# 2. Build the application
./mvnw clean package

# 3. Run with Spring Boot
java -jar target/kafka-stream-*.jar

# 4. For windowing, configure window parameters
java -jar target/kafka-stream-window-*.jar \
  --spring.cloud.stream.kafka.streams.timeWindow.length=60000 \
  --spring.cloud.stream.kafka.streams.timeWindow.advanceBy=40000

# 5. Verify output with Kafka console consumer
kafka-console-consumer \
  --bootstrap-server localhost:9092 \
  --topic output-topic \
  --property print.key=true \
  --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \
  --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer

Pattern Selection Guide

Use Joins When

You need to correlate events from different sources - orders with payments, users with their actions, sensors with their metadata. Critical for building complete entity views from partial events.

Use Windows When

Business questions involve time boundaries - "events per minute," "rolling averages," "peak hours." Essential for dashboards, alerting thresholds, and trend analysis.

Use Aggregations When

You need running totals, counts, or statistics that update in real-time. Powers analytics dashboards, fraud detection counters, and inventory tracking.

Use Routing When

Multiple teams need to process the same event stream with different logic, or when you want to deploy processing functions independently of the ingestion layer.

Production Considerations

Exactly-Once Delivery Semantics

Horizontal Scaling via Partitions

Automatic Fault Recovery

Key Production Settings

Production Configuration

spring:
  cloud:
    stream:
      kafka:
        streams:
          binder:
            configuration:
              # Exactly-once semantics
              processing.guarantee: exactly_once_v2
              # Commit interval for state stores
              commit.interval.ms: 100
              # Number of standby replicas for HA
              num.standby.replicas: 1
          bindings:
            process-in-0:
              consumer:
                # Consumer group for load balancing
                application-id: my-stream-app

Explore the Code

Each project is available on GitHub with complete source code and Docker Compose configurations for local development.

Stream Joins Windowing Aggregations Serverless Router