AS-101 · Module 2

Data Hygiene

3 min read

What you paste into an AI tool matters. Every prompt you send, every file you upload, every code snippet you share — it leaves your security perimeter the moment you hit enter. The AI provider may store it, log it, use it for model improvement, or expose it through a future vulnerability. You do not control what happens to it once it leaves your machine.

This is not hypothetical. Employees at major corporations have pasted proprietary source code, customer PII, internal strategy documents, and database credentials into AI chat interfaces. In each case, the employee was trying to work faster. In each case, the data left the organization's control. The problem is not malice — it is convenience. AI tools are so effective that people instinctively paste in whatever context they think will help, without pausing to consider what that context contains.

Do This

  • Redact PII, credentials, and proprietary information before pasting into any AI tool
  • Use anonymized or synthetic data when testing AI workflows
  • Check your AI provider's data retention and training policies — know where your data goes
  • Establish a clear organizational policy for what can and cannot be shared with AI tools

Avoid This

  • Paste customer data, credentials, or confidential documents into external AI tools
  • Assume that "private" or "enterprise" AI tiers automatically protect your data — read the terms
  • Upload entire codebases or databases to get a quick answer from a chatbot
  • Trust "your data won't be used for training" without verifying the provider's current policy in writing

The practical rule is simple: before you paste anything into an AI tool, ask yourself "would I be comfortable if this appeared in a public data breach?" If the answer is no, redact it or use synthetic data. This mental test takes two seconds and prevents the kind of incident that takes two months to remediate.