Data Scanning

Discover, classify, and validate personal data across structured data sources — without masking. Data Scanning runs asynchronously and requires a paid subscription.

Data Scanning APIs are used to discover, classify, and validate personal data across structured data sources such as databases and warehouses.

Data Scanning is not available on trial accounts. A paid subscription is required.

What data scanning does

Data Scanning APIs do not mask data. They help you answer:

Which tables contain PII
What type of PII exists per column
How confident the system is about detection
Where ML detection needs manual correction

Execution model

Data scanning runs asynchronously. You submit a scan job, it runs in the background, and you retrieve results when complete.

Typical workflow

Submit scan

Use the Data Scan Async API to submit one or more objects for scanning. You receive a tracking_id.

Track progress

Poll the Scan Status API using the tracking_id until status is SUCCESS.

Explore objects

Use List Scan Objects to browse the scanned data source hierarchy.

Inspect results

Use Scan Details to view column-level PII detection results with confidence percentages.

Tune detection

Use Update Scan Conclusions to adjust the confidence threshold used to classify columns as PII.

Correct ML output

Use Update or Delete Detected Entities to manually override incorrect ML results.

API reference

Submit Data Scan

Submit a scan job for one or more data source objects.

Scan Status

Check the execution status of submitted scans.

List Scan Objects

Browse objects available for scanning under a data source.

Scan Details

Fetch column-level PII detection results with confidence scores.

Update Conclusions

Adjust the confidence threshold for PII classification.

Update & Delete Entities

Manually correct or remove ML-detected PII results.

Was this page helpful?