Toxicity DetectionOverview

Toxicity Detection

How Protecto automatically analyzes text for harmful content during masking and unmasking, returning structured safety scores.

Protecto automatically analyzes text for toxic or harmful language while processing mask and unmask requests.

Toxicity detection is passive and additive. It does not block masking or unmasking. Instead, it returns structured scores that let you decide how to handle content downstream.

Where toxicity detection is available

APIToxicity detection supported
Identify and Mask (Auto-detect)Yes
Unmask APIYes
Mask with TokenNo
Mask with FormatNo

The analysis runs on the original semantic content of the text, even when PII is masked or unmasked alongside it.

What the scores mean

Each input text is scored on a continuous scale from 0.0 to 1.0:

  • 0.0 means the category is not present

  • 1.0 means extremely strong presence of that category

Protecto enforces no hard thresholds. You apply your own rules — for example:

  • Log content above 0.3

  • Warn users above 0.6

  • Block content above 0.8

Returned scores are floating-point values.

Next steps