SA-301e · Module 2

Stream Processing Patterns

4 min read

Stream processing treats data as a continuous flow rather than a bounded set. Events arrive, are processed, and produce outputs in real time — or near-real time. The architectural shift from batch to streaming is not incremental. It changes how you think about state, time, and completeness. In batch processing, you process all the data and produce a result. In stream processing, you process each event as it arrives and the result is always provisional — more events may change it.

  1. Stateless Processing Each event is processed independently — filter, transform, route. No state is maintained between events. Stateless processing scales linearly by adding processors. Use it for data enrichment (add a field from a lookup table), filtering (drop events that do not match criteria), and format transformation (convert XML to JSON).
  2. Stateful Processing Processing depends on accumulated state — running totals, session windows, pattern detection across events. State must be managed explicitly: where is it stored, how is it checkpointed, what happens when a processor fails and restarts? Stateful processing is more powerful and significantly more complex to operate.
  3. Windowed Aggregation Aggregate events within time windows: tumbling windows (fixed, non-overlapping), sliding windows (overlapping), and session windows (activity-driven). "Average order value in the last 5 minutes" is a sliding window. "Total revenue per hour" is a tumbling window. "User session activity until 30 minutes of inactivity" is a session window. The window type determines the aggregation semantics.