AS-101 · Module 2

Data Hygiene

3 min read

What you paste into an AI tool matters. Every prompt you send, every file you upload, every code snippet you share — it leaves your security perimeter the moment you hit enter. The AI provider may store it, log it, use it for model improvement, or expose it through a future vulnerability. You do not control what happens to it once it leaves your machine.

This is not hypothetical. Employees at major corporations have pasted proprietary source code, customer PII, internal strategy documents, and database credentials into AI chat interfaces. In each case, the employee was trying to work faster. In each case, the data left the organization's control. The problem is not malice — it is convenience. AI tools are so effective that people instinctively paste in whatever context they think will help, without pausing to consider what that context contains.

Do This

Redact PII, credentials, and proprietary information before pasting into any AI tool
Use anonymized or synthetic data when testing AI workflows
Check your AI provider's data retention and training policies — know where your data goes
Establish a clear organizational policy for what can and cannot be shared with AI tools

Avoid This

Paste customer data, credentials, or confidential documents into external AI tools
Assume that "private" or "enterprise" AI tiers automatically protect your data — read the terms
Upload entire codebases or databases to get a quick answer from a chatbot
Trust "your data won't be used for training" without verifying the provider's current policy in writing

The practical rule is simple: before you paste anything into an AI tool, ask yourself "would I be comfortable if this appeared in a public data breach?" If the answer is no, redact it or use synthetic data. This mental test takes two seconds and prevents the kind of incident that takes two months to remediate.