Toxicity Detection
How Protecto automatically analyzes text for harmful content during masking and unmasking, returning structured safety scores.
Protecto automatically analyzes text for toxic or harmful language while processing mask and unmask requests.
Toxicity detection is passive and additive. It does not block masking or unmasking. Instead, it returns structured scores that let you decide how to handle content downstream.
Where toxicity detection is available
| API | Toxicity detection supported |
|---|---|
| Identify and Mask (Auto-detect) | Yes |
| Unmask API | Yes |
| Mask with Token | No |
| Mask with Format | No |
The analysis runs on the original semantic content of the text, even when PII is masked or unmasked alongside it.
What the scores mean
Each input text is scored on a continuous scale from 0.0 to 1.0:
-
0.0means the category is not present -
1.0means extremely strong presence of that category
Protecto enforces no hard thresholds. You apply your own rules — for example:
-
Log content above
0.3 -
Warn users above
0.6 -
Block content above
0.8
Returned scores are floating-point values.
Next steps
Scoring System
How scores are calculated and how to interpret them.
Categories
All six toxicity categories and what they detect.
Examples
Real-world score examples across different types of language.
In Mask API
How toxicity data appears in auto-detect masking responses.
In Unmask API
How toxicity data appears in unmask responses.
Last updated 3 weeks ago
Built with Documentation.AI