CQRS, Event Sourcing, and CDC: Building Scalable Event-Driven Systems

The Problem: Why Traditional CRUD Falls Short

Most enterprise applications start with a simple CRUD model: Create, Read, Update, Delete. One database, one schema, one access pattern. This works until it doesn't.

Read vs Write Contention — Heavy analytical queries compete with transactional writes for the same resources, degrading both
Lost History — Traditional updates overwrite previous state. You know the current balance, but not how you got there
Schema Rigidity — Optimizing for writes (normalized tables) conflicts with optimizing for reads (denormalized views)
Audit Requirements — Financial and healthcare systems demand complete audit trails that simple UPDATE statements cannot provide

Consider a banking application: when a customer withdraws money, you need fast transactional consistency for the debit. But when they view their statement, you need optimized read performance across potentially millions of transactions. Forcing both through the same path creates bottlenecks.

Event Sourcing: The Source of Truth

Event Sourcing flips the traditional model. Instead of storing current state and losing history, you store the sequence of events that led to the current state. As Martin Fowler describes it:

Event Sourcing Principle

"All changes to an application state are stored as a sequence of events."

Traditional: UPDATE account SET balance = 900 WHERE id = 123
Event Sourced: INSERT INTO events (account_id, type, amount, timestamp)
              VALUES (123, 'WITHDRAWAL', 100, NOW())

The current state is derived by replaying events. This provides complete auditability, temporal queries ("what was the balance last Tuesday?"), and the ability to rebuild read models if they become corrupted.

CQRS: Separating Commands from Queries

CQRS (Command Query Responsibility Segregation) takes this further by physically separating write and read paths. The insight is simple but powerful: commands and queries have fundamentally different requirements.

Command vs Query Characteristics

COMMANDS (Writes)                  QUERIES (Reads)
─────────────────────────────────────────────────────────
Imperative: "Withdraw $100"        Interrogative: "Show balance"
May fail validation                Never fail (data exists or not)
Require consistency                Tolerate eventual consistency
Optimized for writes               Optimized for reads
Normalized schemas                 Denormalized for speed
Single source of truth             Multiple materialized views

By separating these concerns, each path can be independently scaled and optimized. Write-heavy workloads get dedicated resources. Read-heavy analytics get their own optimized views.

The Solution: CDC as the Bridge

Here's the challenge: if writes and reads are separate, how do you keep them synchronized? Traditional approaches involve dual-writes (error-prone) or application-level event publishing (complex and tightly coupled).

Change Data Capture (CDC) solves this elegantly. Instead of your application publishing events, a CDC tool watches the database transaction log and captures changes automatically. Your application writes normally; CDC handles the rest.

CQRS with CDC Architecture Flow

1

Command

Application writes to MySQL (e.g., debit card transaction)

→

2

Capture

Debezium reads MySQL binlog via Kafka Connect

→

3

Stream

Changes published to Kafka topics as events

→

4

Process

Spring Cloud Stream consumers update read models

→

5

Query

Druid serves fast analytics on materialized views

Technology Stack Deep Dive

Component	Technology	Role in Architecture
Transactional Store	MySQL	Primary database for command operations. Handles ACID transactions for writes.
Change Data Capture	Debezium + Kafka Connect	Monitors MySQL binlog, converts row changes to events, publishes to Kafka.
Event Streaming	Apache Kafka	Durable, ordered event log. Decouples producers from consumers.
Event Processing	Spring Cloud Stream	Java consumers that transform events and update read models.
Analytics Engine	Apache Druid	OLAP database optimized for fast aggregations and time-series queries.
Coordination	Zookeeper	Distributed coordination for Kafka cluster management.
Deployment	Docker	Containerized services for consistent local and production environments.

Reference Implementation: Banking Application

The demonstration uses a realistic banking scenario to illustrate the pattern. A customer performs a debit card withdrawal. This single action flows through the entire architecture.

The Command Path: Processing Withdrawals

Withdrawal Command (Write Side)

// Application receives withdrawal request
@PostMapping("/withdraw")
public ResponseEntity withdraw(@RequestBody WithdrawalRequest request) {
    // Validate and execute business logic
    debitCardService.processWithdrawal(
        request.getCardId(),
        request.getAmount()
    );
    // MySQL INSERT into debit_card table
    // Transaction commits, triggers CDC capture
    return ResponseEntity.ok().build();
}

The CDC Path: Automatic Event Capture

Debezium monitors MySQL's binary log. When the withdrawal commits, Debezium captures the change and publishes it to Kafka without any application code changes.

Debezium Connector Configuration

{
  "name": "bank-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "database.hostname": "mysql",
    "database.port": "3306",
    "database.user": "debezium",
    "database.password": "dbz",
    "database.server.id": "184054",
    "database.server.name": "bankserver",
    "database.include.list": "bank",
    "table.include.list": "bank.debit_card",
    "include.schema.changes": "true"
  }
}

The Query Path: Building Read Models

Spring Cloud Stream consumers listen for CDC events and populate optimized read tables.

Event Consumer (Read Side)

@StreamListener(Sink.INPUT)
public void handleDebitCardEvent(DebitCardChangeEvent event) {
    if (event.getOperation() == Operation.INSERT) {
        // Transform and store in read-optimized table
        MoneyWithdrawal withdrawal = MoneyWithdrawal.builder()
            .cardId(event.getCardId())
            .amount(event.getAmount())
            .timestamp(event.getTimestamp())
            .build();

        // Insert into money_withdrawal table
        // Optimized for statement queries
        withdrawalRepository.save(withdrawal);

        // Also publish to Druid for analytics
        druidPublisher.publish(withdrawal);
    }
}

The Analytics Path: Real-Time Insights

Apache Druid consumes events for sub-second analytical queries across millions of transactions.

Druid Query Example

-- Mini statement: last 10 withdrawals for a card
SELECT timestamp, amount, merchant
FROM withdrawals
WHERE card_id = '4532-xxxx-xxxx-1234'
ORDER BY timestamp DESC
LIMIT 10

-- Analytics: withdrawal patterns by hour
SELECT
  TIME_FLOOR(__time, 'PT1H') as hour,
  COUNT(*) as transaction_count,
  SUM(amount) as total_amount
FROM withdrawals
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR
GROUP BY 1
ORDER BY 1

Understanding the Data Flow

Complete Data Flow

┌─────────────────────────────────────────────────────────────────────┐
│                         COMMAND SIDE                                 │
│  ┌──────────┐    ┌──────────────┐    ┌─────────────────┐           │
│  │  Client  │───→│ Spring Boot  │───→│  MySQL (OLTP)   │           │
│  │  Request │    │  Application │    │  debit_card     │           │
│  └──────────┘    └──────────────┘    └────────┬────────┘           │
└────────────────────────────────────────────────┼────────────────────┘
                                                 │ binlog
                                                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         CDC LAYER                                    │
│  ┌──────────────────────────────────────────────────────────┐       │
│  │  Kafka Connect + Debezium MySQL Connector                 │       │
│  │  Reads binlog → Converts to events → Publishes to Kafka   │       │
│  └──────────────────────────────────────────────────────────┘       │
└────────────────────────────────────────────────┬────────────────────┘
                                                 │ events
                                                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                       EVENT STREAMING                                │
│  ┌──────────────────────────────────────────────────────────┐       │
│  │  Apache Kafka                                             │       │
│  │  Topic: bankserver.bank.debit_card                        │       │
│  └──────────────────────────────────────────────────────────┘       │
└───────────────────────┬─────────────────────────┬───────────────────┘
                        │                         │
                        ▼                         ▼
┌───────────────────────────────────┐ ┌───────────────────────────────┐
│         QUERY SIDE (OLTP)         │ │      ANALYTICS (OLAP)         │
│  ┌─────────────────────────────┐  │ │  ┌─────────────────────────┐  │
│  │  Spring Cloud Stream        │  │ │  │  Kafka Connect          │  │
│  │  Consumer                   │  │ │  │  Druid Indexer          │  │
│  └──────────────┬──────────────┘  │ │  └───────────┬─────────────┘  │
│                 │                  │ │              │                │
│                 ▼                  │ │              ▼                │
│  ┌─────────────────────────────┐  │ │  ┌─────────────────────────┐  │
│  │  MySQL                      │  │ │  │  Apache Druid           │  │
│  │  money_withdrawal table     │  │ │  │  Real-time OLAP         │  │
│  │  (read-optimized)           │  │ │  │  Sub-second queries     │  │
│  └─────────────────────────────┘  │ │  └─────────────────────────┘  │
└───────────────────────────────────┘ └───────────────────────────────┘

Why This Architecture Wins

Independent Scalability

Write-heavy workloads scale separately from read-heavy analytics. Add read replicas without affecting transactional performance.

Complete Auditability

Events are immutable facts. Replay them to reconstruct state at any point in time. Perfect for compliance and debugging.

Resilient Views

If a read model corrupts, rebuild it from the event log. Events are the source of truth; views are derived.

Zero Application Changes for CDC

Debezium reads the database log directly. Your application code stays simple. No dual-write complexity.

100% Event Capture Reliability

<100ms CDC Latency

Millions Events per Second

Understanding the Trade-offs

This architecture isn't free. Know what you're signing up for:

Architectural Trade-offs

BENEFIT                         COST
────────────────────────────────────────────────────────────
Scalability                     Increased operational complexity
Auditability                    More infrastructure to manage
Resilience                      Eventual consistency (not immediate)
Flexibility                     Steeper learning curve
Decoupling                      No traditional ACID across aggregates

Eventual Consistency: After a write, queries may return stale data until CDC propagates. For banking, withdrawals are immediately consistent (command side) but statements may lag slightly (query side). Design your UX accordingly.

Operational Overhead: You're now running Kafka, Zookeeper, Debezium, and potentially Druid. Each needs monitoring, scaling, and maintenance. Use this pattern when benefits outweigh operational costs.

When to Apply This Pattern

Good Fit

High read-to-write ratios where reads can tolerate eventual consistency
Audit requirements demanding complete history
Analytics workloads competing with transactional systems
Microservices needing to react to database changes without tight coupling
Legacy database integration without modifying application code

Poor Fit

Simple CRUD applications without scaling or audit needs
Strong consistency requirements where eventual consistency is unacceptable
Small teams without DevOps capacity for distributed systems
Rapid prototyping where simplicity trumps architecture

Getting Started

The reference implementation provides a complete Docker Compose setup. Clone the repository and run:

Quick Start

# Clone the repository
git clone https://github.com/mgorav/CqrsWithCDC.git
cd CqrsWithCDC

# Start all services
docker-compose up -d

# Services started:
# - MySQL (port 3306)
# - Kafka + Zookeeper (ports 9092, 2181)
# - Kafka Connect + Debezium (port 8083)
# - Spring Boot Application (port 8080)
# - Druid (port 8888)

# Register Debezium connector
curl -X POST -H "Content-Type: application/json" \
  --data @connector-config.json \
  http://localhost:8083/connectors

# Test a withdrawal
curl -X POST http://localhost:8080/withdraw \
  -H "Content-Type: application/json" \
  -d '{"cardId": "4532-1234-5678-9012", "amount": 100}'

# Query the read model
curl http://localhost:8080/statements/4532-1234-5678-9012

Explore the Code

The complete implementation with Docker Compose, Debezium connectors, Spring Cloud Stream consumers, and Druid integration is available on GitHub.

View on GitHub