DS-301e · Module 1
Streaming Data Architecture
3 min read
Streaming architecture processes data as it arrives rather than in scheduled batches. The event occurs (a deal moves stages, a user clicks, a transaction completes), the event is published to a stream, the stream processor transforms it, and the dashboard reflects the change within seconds. The architecture: event producers publish to a message broker (Kafka, Kinesis, Pub/Sub). Stream processors consume, transform, and aggregate. The aggregated results write to a real-time data store (Redis, ClickHouse, Druid). The dashboard queries the real-time store. Each component is designed for throughput and low latency. The trade-off: streaming architecture is more complex and more expensive than batch. It is justified only when the decision cadence demands it.
Do This
- Use streaming only for dashboards where the decision cadence is intraday or faster
- Build the streaming pipeline with replay capability — when something goes wrong, you need to reprocess
- Monitor streaming lag as a system health metric — a stream that falls behind is a dashboard that lies
Avoid This
- Build streaming pipelines for metrics that are reviewed weekly — batch is cheaper and simpler
- Assume streaming means instant — every component adds latency, and the total must be within the requirement
- Deploy streaming without monitoring — an unmonitored stream fails silently and the dashboard shows stale data