Activate offensiveness protection for generative AI

Yokohama Enable AI

Release

yokohama

ft:locale

en-US

ft:publication_title

Yokohama Enable AI

ft:clusterId

platai

bundleId

platai

workflow

Platform

Activate offensiveness protection for generative AI

Release version: Yokohama

Updated July 31, 2025

1 minute to read

Activate offensiveness detection to log or block offensive content generated by Now Assist skills and workflows.

Before you begin

Role required: sn_generative_ai.nsa_admin

About this task

Generative AI output is probabilistic, which means that the same input can produce different outputs. Some of the AI generated content may be offensive, which includes toxic, sexist, or other harmful language. Now Assist Guardian enables you to detect offensive content in both inputs and outputs, and logs the event when it is detected. You can also configure it to block offensive material so that users see a standard error message instead of the generated response.

Note:

Offensiveness detection applies only to specific Now Assist skills and workflows. It is not available for all Now Assist applications. For more information about the list of skills that support offensiveness detection, see Now Assist Guardian.

You can export logs for review. For more information, see Export Now Assist Guardian logs.

Procedure

Navigate to All > Now Assist Admin > Settings.
In the side panel, select the Now Assist Guardian > Offensiveness tab.
Go to the Available for you tab to see which workflows you can choose from.
Offensiveness guardrails that are already activated appear in the Active tab.
Select Activate for the workflow that you want to enable offensiveness detection.
Select your impact detection.
- Select Log only to record the events when offensive content is detected. The content is still shown to the user.
- Select Block and log to record the event and prevents the content from being shown to the user. The user sees a standard error message instead.
Select Save.

Result

Offensiveness detection guardrail is enabled on your instance for the selected workflow. Events are logged when offensive content is detected or generated.

What to do next

You can enable offensiveness detection for separately for each supported Now Assist application and workflow. Repeat this task for each workflow on which you want offensiveness protection enabled.

To change the detection impact for an active workflow, select more options () icon in the list of active workflows and then select Edit.

To deactivate offensiveness protection for a workflow, select more options () icon in the list of active workflows and then select Deactivate .