Toxicity Detection

How Protecto automatically analyzes text for harmful content during masking and unmasking, returning structured safety scores.

Protecto automatically analyzes text for toxic or harmful language while processing mask and unmask requests.

Toxicity detection is passive and additive. It does not block masking or unmasking. Instead, it returns structured scores that let you decide how to handle content downstream.

Where toxicity detection is available

API	Toxicity detection supported
Identify and Mask (Auto-detect)	Yes
Unmask API	Yes
Mask with Token	No
Mask with Format	No

The analysis runs on the original semantic content of the text, even when PII is masked or unmasked alongside it.

What the scores mean

Each input text is scored on a continuous scale from 0.0 to 1.0:

0.0 means the category is not present
1.0 means extremely strong presence of that category

Protecto enforces no hard thresholds. You apply your own rules — for example:

Log content above 0.3
Warn users above 0.6
Block content above 0.8

Returned scores are floating-point values.

Next steps

Scoring System

How scores are calculated and how to interpret them.