DS-301e · Module 1

Streaming Data Architecture

3 min read

Streaming architecture processes data as it arrives rather than in scheduled batches. The event occurs (a deal moves stages, a user clicks, a transaction completes), the event is published to a stream, the stream processor transforms it, and the dashboard reflects the change within seconds. The architecture: event producers publish to a message broker (Kafka, Kinesis, Pub/Sub). Stream processors consume, transform, and aggregate. The aggregated results write to a real-time data store (Redis, ClickHouse, Druid). The dashboard queries the real-time store. Each component is designed for throughput and low latency. The trade-off: streaming architecture is more complex and more expensive than batch. It is justified only when the decision cadence demands it.

Do This

Use streaming only for dashboards where the decision cadence is intraday or faster
Build the streaming pipeline with replay capability — when something goes wrong, you need to reprocess
Monitor streaming lag as a system health metric — a stream that falls behind is a dashboard that lies

Avoid This

Build streaming pipelines for metrics that are reviewed weekly — batch is cheaper and simpler
Assume streaming means instant — every component adds latency, and the total must be within the requirement
Deploy streaming without monitoring — an unmonitored stream fails silently and the dashboard shows stale data