Step 2 — Preparing Your Data for AI: Clean, Classify, Consolidate (1-Minute Read)
Part of the “Becoming AI-Ready” Series
AI is only as good as the data it can see, trust, and interpret.
Right now, most orgs have data everywhere — across Teams, SharePoint, OneDrive, file shares, laptops, old NAS devices, Dropbox, Box, and shadow IT.
If identity is the foundation, data is the fuel.
Here’s how to get it ready for AI.
1. Consolidate Your Data Locations
AI becomes useless when content is scattered across 20 systems.
Goal: fewer storage locations, more consistency.
How-to:
Microsoft 365 content management through Sharepoint and Onedrive.
https://learn.microsoft.com/en-us/sharepoint/introduction
2. Classify What’s Sensitive (Automatically if possible)
AI needs context. Labels provide it.
How-to:
Learn sensitivity labels
https://learn.microsoft.com/en-us/purview/sensitivity-labels
Auto-labeling (recommended!)
https://learn.microsoft.com/en-us/purview/apply-sensitivity-label-automatically
3. Apply DLP to Protect the Data AI Will Access
AI inherits access — so classify, then protect.

How-to:
Microsoft Purview DLP overview
https://learn.microsoft.com/en-us/purview/dlp-learn-about-dlp
4. Remove ROT Data (Redundant, Outdated, Trivial)
Yes — AI will process trash too.
You don’t want that.
How-to:
Use retention policies to clean up old content
https://learn.microsoft.com/en-us/purview/retention?tabs=table-overriden
5. Secure External Sharing
AI will also “see” shared documents if the user sees them.

How-to:
Configure external sharing policies in Microsoft 365
https://learn.microsoft.com/en-us/sharepoint/external-sharing-overview
Why It Matters
AI doesn’t magically fix data.
It amplifies the state your data is already in.
Cleaner data = better output
Classified data = safer output
Consolidated data = faster output
Next up:
Step 3 — Access Controls in an AI World: Least Privilege at Scale.
— Jean-Paul Abi Atme
