SA-201b · Module 2

Data Flow Mapping

4 min read

Data flow mapping is the architecture equivalent of a plumbing diagram. It shows where data originates, how it moves between systems, where it is transformed, where it is stored, and where it is consumed. Without a data flow map, integration debugging is guesswork. With one, you can trace any data anomaly from source to destination and identify the exact point where the flow breaks.

  1. Identify Data Sources Where does each piece of data originate? The CRM holds customer records. The ERP holds financial data. The AI model produces inference outputs. Each source is a starting point on the map. If you cannot identify the authoritative source for a piece of data, you have a data ownership problem that the architecture must resolve.
  2. Map Transformations Data rarely moves between systems in the same format. Customer names are concatenated. Dates are reformatted. Currencies are converted. Each transformation is a point on the map where data changes shape — and each transformation is a potential source of data quality issues. Document every transformation with the logic, the trigger, and the error handling.
  3. Identify Storage Points Where is data persisted, and for how long? Databases, caches, message queues, file systems, and third-party services all store data. Each storage point has retention, security, and compliance implications. The data flow map must show not just where data moves but where it rests.
  4. Document Consumption Who reads the data at each storage point? What decisions are made based on it? Data that is stored but never consumed is either waste or a latent capability. Data that is consumed by critical processes needs higher reliability guarantees than data consumed by reporting dashboards.