Data Scanning
Discover, classify, and validate personal data across structured data sources — without masking. Data Scanning runs asynchronously and requires a paid subscription.
Data Scanning APIs are used to discover, classify, and validate personal data across structured data sources such as databases and warehouses.
Data Scanning is not available on trial accounts. A paid subscription is required.
What data scanning does
Data Scanning APIs do not mask data. They help you answer:
-
Which tables contain PII
-
What type of PII exists per column
-
How confident the system is about detection
-
Where ML detection needs manual correction
Execution model
Data scanning runs asynchronously. You submit a scan job, it runs in the background, and you retrieve results when complete.
Typical workflow
Submit scan
Use the Data Scan Async API to submit one or more objects for scanning. You receive a tracking_id.
Track progress
Poll the Scan Status API using the tracking_id until status is SUCCESS.
Explore objects
Use List Scan Objects to browse the scanned data source hierarchy.
Inspect results
Use Scan Details to view column-level PII detection results with confidence percentages.
Tune detection
Use Update Scan Conclusions to adjust the confidence threshold used to classify columns as PII.
Correct ML output
Use Update or Delete Detected Entities to manually override incorrect ML results.
API reference
Submit Data Scan
Submit a scan job for one or more data source objects.
Scan Status
Check the execution status of submitted scans.
List Scan Objects
Browse objects available for scanning under a data source.
Scan Details
Fetch column-level PII detection results with confidence scores.
Update Conclusions
Adjust the confidence threshold for PII classification.
Update & Delete Entities
Manually correct or remove ML-detected PII results.
Last updated 2 weeks ago
Built with Documentation.AI