The Problem: Why Traditional CRUD Falls Short
Most enterprise applications start with a simple CRUD model: Create, Read, Update, Delete. One database, one schema, one access pattern. This works until it doesn't.
- Read vs Write Contention — Heavy analytical queries compete with transactional writes for the same resources, degrading both
- Lost History — Traditional updates overwrite previous state. You know the current balance, but not how you got there
- Schema Rigidity — Optimizing for writes (normalized tables) conflicts with optimizing for reads (denormalized views)
- Audit Requirements — Financial and healthcare systems demand complete audit trails that simple UPDATE statements cannot provide
Consider a banking application: when a customer withdraws money, you need fast transactional consistency for the debit. But when they view their statement, you need optimized read performance across potentially millions of transactions. Forcing both through the same path creates bottlenecks.
Event Sourcing: The Source of Truth
Event Sourcing flips the traditional model. Instead of storing current state and losing history, you store the sequence of events that led to the current state. As Martin Fowler describes it:
"All changes to an application state are stored as a sequence of events."
Traditional: UPDATE account SET balance = 900 WHERE id = 123
Event Sourced: INSERT INTO events (account_id, type, amount, timestamp)
VALUES (123, 'WITHDRAWAL', 100, NOW())
The current state is derived by replaying events. This provides complete auditability, temporal queries ("what was the balance last Tuesday?"), and the ability to rebuild read models if they become corrupted.
CQRS: Separating Commands from Queries
CQRS (Command Query Responsibility Segregation) takes this further by physically separating write and read paths. The insight is simple but powerful: commands and queries have fundamentally different requirements.
COMMANDS (Writes) QUERIES (Reads)
─────────────────────────────────────────────────────────
Imperative: "Withdraw $100" Interrogative: "Show balance"
May fail validation Never fail (data exists or not)
Require consistency Tolerate eventual consistency
Optimized for writes Optimized for reads
Normalized schemas Denormalized for speed
Single source of truth Multiple materialized views
By separating these concerns, each path can be independently scaled and optimized. Write-heavy workloads get dedicated resources. Read-heavy analytics get their own optimized views.
The Solution: CDC as the Bridge
Here's the challenge: if writes and reads are separate, how do you keep them synchronized? Traditional approaches involve dual-writes (error-prone) or application-level event publishing (complex and tightly coupled).
Change Data Capture (CDC) solves this elegantly. Instead of your application publishing events, a CDC tool watches the database transaction log and captures changes automatically. Your application writes normally; CDC handles the rest.
Command
Application writes to MySQL (e.g., debit card transaction)
Capture
Debezium reads MySQL binlog via Kafka Connect
Stream
Changes published to Kafka topics as events
Process
Spring Cloud Stream consumers update read models
Query
Druid serves fast analytics on materialized views
Technology Stack Deep Dive
| Component | Technology | Role in Architecture |
|---|---|---|
| Transactional Store | MySQL | Primary database for command operations. Handles ACID transactions for writes. |
| Change Data Capture | Debezium + Kafka Connect | Monitors MySQL binlog, converts row changes to events, publishes to Kafka. |
| Event Streaming | Apache Kafka | Durable, ordered event log. Decouples producers from consumers. |
| Event Processing | Spring Cloud Stream | Java consumers that transform events and update read models. |
| Analytics Engine | Apache Druid | OLAP database optimized for fast aggregations and time-series queries. |
| Coordination | Zookeeper | Distributed coordination for Kafka cluster management. |
| Deployment | Docker | Containerized services for consistent local and production environments. |
Reference Implementation: Banking Application
The demonstration uses a realistic banking scenario to illustrate the pattern. A customer performs a debit card withdrawal. This single action flows through the entire architecture.
The Command Path: Processing Withdrawals
// Application receives withdrawal request
@PostMapping("/withdraw")
public ResponseEntity withdraw(@RequestBody WithdrawalRequest request) {
// Validate and execute business logic
debitCardService.processWithdrawal(
request.getCardId(),
request.getAmount()
);
// MySQL INSERT into debit_card table
// Transaction commits, triggers CDC capture
return ResponseEntity.ok().build();
}
The CDC Path: Automatic Event Capture
Debezium monitors MySQL's binary log. When the withdrawal commits, Debezium captures the change and publishes it to Kafka without any application code changes.
{
"name": "bank-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "mysql",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": "184054",
"database.server.name": "bankserver",
"database.include.list": "bank",
"table.include.list": "bank.debit_card",
"include.schema.changes": "true"
}
}
The Query Path: Building Read Models
Spring Cloud Stream consumers listen for CDC events and populate optimized read tables.
@StreamListener(Sink.INPUT)
public void handleDebitCardEvent(DebitCardChangeEvent event) {
if (event.getOperation() == Operation.INSERT) {
// Transform and store in read-optimized table
MoneyWithdrawal withdrawal = MoneyWithdrawal.builder()
.cardId(event.getCardId())
.amount(event.getAmount())
.timestamp(event.getTimestamp())
.build();
// Insert into money_withdrawal table
// Optimized for statement queries
withdrawalRepository.save(withdrawal);
// Also publish to Druid for analytics
druidPublisher.publish(withdrawal);
}
}
The Analytics Path: Real-Time Insights
Apache Druid consumes events for sub-second analytical queries across millions of transactions.
-- Mini statement: last 10 withdrawals for a card
SELECT timestamp, amount, merchant
FROM withdrawals
WHERE card_id = '4532-xxxx-xxxx-1234'
ORDER BY timestamp DESC
LIMIT 10
-- Analytics: withdrawal patterns by hour
SELECT
TIME_FLOOR(__time, 'PT1H') as hour,
COUNT(*) as transaction_count,
SUM(amount) as total_amount
FROM withdrawals
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR
GROUP BY 1
ORDER BY 1
Understanding the Data Flow
┌─────────────────────────────────────────────────────────────────────┐
│ COMMAND SIDE │
│ ┌──────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ Client │───→│ Spring Boot │───→│ MySQL (OLTP) │ │
│ │ Request │ │ Application │ │ debit_card │ │
│ └──────────┘ └──────────────┘ └────────┬────────┘ │
└────────────────────────────────────────────────┼────────────────────┘
│ binlog
▼
┌─────────────────────────────────────────────────────────────────────┐
│ CDC LAYER │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Kafka Connect + Debezium MySQL Connector │ │
│ │ Reads binlog → Converts to events → Publishes to Kafka │ │
│ └──────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────┬────────────────────┘
│ events
▼
┌─────────────────────────────────────────────────────────────────────┐
│ EVENT STREAMING │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Apache Kafka │ │
│ │ Topic: bankserver.bank.debit_card │ │
│ └──────────────────────────────────────────────────────────┘ │
└───────────────────────┬─────────────────────────┬───────────────────┘
│ │
▼ ▼
┌───────────────────────────────────┐ ┌───────────────────────────────┐
│ QUERY SIDE (OLTP) │ │ ANALYTICS (OLAP) │
│ ┌─────────────────────────────┐ │ │ ┌─────────────────────────┐ │
│ │ Spring Cloud Stream │ │ │ │ Kafka Connect │ │
│ │ Consumer │ │ │ │ Druid Indexer │ │
│ └──────────────┬──────────────┘ │ │ └───────────┬─────────────┘ │
│ │ │ │ │ │
│ ▼ │ │ ▼ │
│ ┌─────────────────────────────┐ │ │ ┌─────────────────────────┐ │
│ │ MySQL │ │ │ │ Apache Druid │ │
│ │ money_withdrawal table │ │ │ │ Real-time OLAP │ │
│ │ (read-optimized) │ │ │ │ Sub-second queries │ │
│ └─────────────────────────────┘ │ │ └─────────────────────────┘ │
└───────────────────────────────────┘ └───────────────────────────────┘
Why This Architecture Wins
Independent Scalability
Write-heavy workloads scale separately from read-heavy analytics. Add read replicas without affecting transactional performance.
Complete Auditability
Events are immutable facts. Replay them to reconstruct state at any point in time. Perfect for compliance and debugging.
Resilient Views
If a read model corrupts, rebuild it from the event log. Events are the source of truth; views are derived.
Zero Application Changes for CDC
Debezium reads the database log directly. Your application code stays simple. No dual-write complexity.
Understanding the Trade-offs
This architecture isn't free. Know what you're signing up for:
BENEFIT COST
────────────────────────────────────────────────────────────
Scalability Increased operational complexity
Auditability More infrastructure to manage
Resilience Eventual consistency (not immediate)
Flexibility Steeper learning curve
Decoupling No traditional ACID across aggregates
Eventual Consistency: After a write, queries may return stale data until CDC propagates. For banking, withdrawals are immediately consistent (command side) but statements may lag slightly (query side). Design your UX accordingly.
Operational Overhead: You're now running Kafka, Zookeeper, Debezium, and potentially Druid. Each needs monitoring, scaling, and maintenance. Use this pattern when benefits outweigh operational costs.
When to Apply This Pattern
Good Fit
- High read-to-write ratios where reads can tolerate eventual consistency
- Audit requirements demanding complete history
- Analytics workloads competing with transactional systems
- Microservices needing to react to database changes without tight coupling
- Legacy database integration without modifying application code
Poor Fit
- Simple CRUD applications without scaling or audit needs
- Strong consistency requirements where eventual consistency is unacceptable
- Small teams without DevOps capacity for distributed systems
- Rapid prototyping where simplicity trumps architecture
Getting Started
The reference implementation provides a complete Docker Compose setup. Clone the repository and run:
# Clone the repository
git clone https://github.com/mgorav/CqrsWithCDC.git
cd CqrsWithCDC
# Start all services
docker-compose up -d
# Services started:
# - MySQL (port 3306)
# - Kafka + Zookeeper (ports 9092, 2181)
# - Kafka Connect + Debezium (port 8083)
# - Spring Boot Application (port 8080)
# - Druid (port 8888)
# Register Debezium connector
curl -X POST -H "Content-Type: application/json" \
--data @connector-config.json \
http://localhost:8083/connectors
# Test a withdrawal
curl -X POST http://localhost:8080/withdraw \
-H "Content-Type: application/json" \
-d '{"cardId": "4532-1234-5678-9012", "amount": 100}'
# Query the read model
curl http://localhost:8080/statements/4532-1234-5678-9012
Explore the Code
The complete implementation with Docker Compose, Debezium connectors, Spring Cloud Stream consumers, and Druid integration is available on GitHub.
View on GitHub