Now Assist Guardian

  • Versão de lançamento: Australia
  • Atualizado 12 de mar. de 2026
  • 8 min. de leitura
  • Enable Now Assist Guardian, built with ServiceNow Small Language Model, to monitor and evaluate content created with generative AI to help protect and enhance the user experience.

    Now Assist Guardian Overview

    Generative AI is an emerging technology. Human interactions are unpredictable, and outputs generated by large language model (LLM) are probabilistic, which means that they're based on probabilities. Running the same input twice may generate two different outputs. Managing risk is an important element of deciding how you want to implement generative AI on your instances.

    Now Assist Guardian monitors requests sent to LLMs and their responses to help protect you, your users, and your data. There are three types of content that are monitored for: offensive or harmful content, prompt injection attempts, and filtered subjects. For offensive content and prompt injection attempts, logs are generated if activated, but you can also choose to block the content. When a filter has been activated, detected content that the filter applies to will redirect the user to the Sensitivity Detection: Fallback topic in Virtual Agent.

    Guardrails

    Offensive content
    Due to the probabilistic nature of generative AI, it's possible for an LLM to generate offensive content. If there's offensive content in the input of the request, offensive content can also occur in the response. Examples of offensive content include language that is toxic, defamatory, or fraudulent.
    Prompt injection
    Prompt injection is a type of security attack where bad actors override the normal instructions of an LLM to access restricted information or elicit unexpected behaviors. Prompt injection detection is based on the LLM which has been trained on various types of prompt injection techniques such as role playing, paraphrasing, repetition, instructions to ignore other instructions, persuasion, etc. However, due to the probabilistic nature of the model as well as evolving prompt injection techniques, prompt injection attempts may not be identified by Now Assist Guardian in some cases.
    Filtered subjects
    Certain subjects, such as workplace safety or employee compensation, might not be best suited for generative AI conversations. You can activate filters that detect if these kinds of subjects are included in the conversation so that you can redirect the user to the Sensitivity Detection: Fallback Virtual Agent topic.

    Logging and blocking

    Now Assist Guardian can monitor requests and can log when these kinds of material are detected. You can access logs from the Now Assist Admin console in the Now Assist Guardian page of the Settings tab. Data included in the logs include information about the request and the conversation that contains the offensive content, including user feedback.

    Besides logging, you can also choose to block offensive content or prompt injection attempts. If the material is detected and blocking is turned on, you see a standard error message instead of the generated response. The message is a standard error message indicating that the request couldn’t be completed, and you don't see what the AI generated.

    Before deciding to block content, you can monitor logs for some time to determine how prevalent these issues are for you and your use cases.

    Redirection for sensitive filtered topics

    Once a topic that a filter applies to has been identified, the user is redirected to a different Virtual Agent topic, dependent on the type of filter. Filters for subjects like employee personal issues will redirect to the Sensitivity Detection: Fallback topic. This topic can redirect a user to a live agent or help them create an HR case.

    Users also have the option to override the redirection by selecting Proceed, not sensitive, which returns them to their original topic without initiating the fallback flow.
    Nota:
    Once the user continues with the fallback topic, for example, by beginning the flow to create an HR case, the Virtual Agent does not continue detecting sensitive topics within that conversation.

    Now Assist Guardian at runtime

    All skills that use Now Assist Guardian remove personally identifiable information (PII) before the request reaches the LLM. You can choose what kinds of data are caught. See Configuring Now Assist for Data Privacy for more information.

    For conversational skills, semantic search processes requests to determine whether a filter has been detected. If so, the user is redirected to a Virtual Agent topic that asks if they want to make an HR case or speak to a live agent.

    Infographic showing Now Assist Guardian at runtime with sensitivity filter guardrail

    For catalog item generation and agent skills like summarization and resolution note generation, offensiveness and prompt injection guardrails are run on inputs and outputs of requests. If either is detected, Now Assist Guardian logs the request. If you’ve chosen to block this content, then a standard error message appears, and the user doesn’t see the result of the request.

    Infographic showing Now Assist Guardian at runtime with offensiveness and security guardrails

    Safety Categories

    The following are the different kinds of content that can be flagged as offensive or poses a security risk. You can identify what categories are being caught in the Now Assist Guardian logs for the different guardrails.

    Category

    Category ID

    Description

    Toxic Content

    O1

    Should not: Promote hate speech, incite violence like child abuse or bullying or engage in personal attacks or harassment.

    Can: Discuss the impacts and societal issues related to toxic content for educational purposes.

    Unfair Representation

    O2

    Should not: Assist in perpetuating stereotypes or encourage discrimination or bias.

    Can: Discuss issues of representation and promote awareness and inclusivity.

    Adult Content

    O3

    Should not: Display sexually explicit material or content inappropriate for minors including pornography, erotic chat and sexual activity.

    Can: Offer educational discussions on topics related to sexuality and consent.

    Erosion of Trust in Public Information

    O4

    Should not: Spread misinformation or disinformation undermining public trust such as misleading medical advice, illegal drugs and advice.

    Can: Critically analyze misinformation trends and educate about media literacy.

    Propagating Misconceptions/False Beliefs

    O5

    Should not: Disseminate unverified, misleading information, conspiracy theories, science denialism, historical revisionism, or content that could lead to radicalization.

    Can: Provide critical analysis and debunking of common misconceptions and false beliefs.

    Risky Financial Practices

    O6

    Should not: Promote unethical trade practices such as multi-level marketing, deceptive lending practices, or gambling.

    Can: Educate about associated risks, ethical lending, responsible gambling, trade compliance, and fair business practices.

    Trade and Compliance

    O7

    Should not: Involve labor issues, engage in unethical trade affairs, contribute to unfair labor practices or promote fraudulent trade practices.

    Can: Provide information and insights into labor issues, ethical trade practices, trade compliance, and fair business conduct.

    Dissemination of Dangerous Information

    O8

    Should not: Spread fake news, propagate social media rumors, disclose government secrets or leak financial data and other contents that contributing to public panic or national security.

    Can: Educate about the dangers of such information and promote safety.

    Privacy Infringement

    O9

    Should not: Involve doxxing, unauthorized data sharing, or surveillance like private information inferring or leaking.

    Can: Discuss the importance of privacy rights and digital security.

    Security Threats

    O10

    Should not: Spread malware or encourage terrorism or other security threats such as weapon generation, cyberattack or chemical harms.

    Can: Provide information and methods to identify and mitigate security threats.

    Defamation

    O11

    Should not: Publish false statements damaging to individuals, groups or organizations.

    Can: Explore the legal and ethical aspects of defamation and free speech.

    Fraud or Deceptive Action

    O12

    Should not: Engage in academic dishonesty, perpetrate scams, promote pseudo-pharmaceuticals, or disseminate fake online content. Content should not deceive or defraud individuals.

    Can: Educate about recognizing and preventing fraudulent activities.

    Influence Operations

    O13

    Should not: Manipulate public opinion or disrupt societal harmony for political or ideological purposes like surveillance or campaign.

    Can: Analyze and educate about the nature and impact of influence operations.

    Illegal Activities

    O14

    Should not: Promote or provide guidance on engaging in illegal actions like crimes or terrorism.

    Can: Educate about the consequences and prevention of illegal activities.

    Persuasion and Manipulation

    O15

    Should not: Involve persuasion or manipulation techniques that promote self-harm, psychological manipulation, or any form of harm to individuals' well-being.

    Can: Educate about recognizing and resisting manipulation, offer support and resources for those affected.

    Violation of Personal Property

    O16

    Should not: Violate an individual's personal property, involving engaging in malicious mimicking, plagiarism, copyright infringement and forging identity.

    Can: Promote awareness and discussions on respecting personal property and preventing such violations.

    Skills that support Now Assist Guardian

    Tabela 1. Supported skills by workflow
    Workflow Supported skills by product
    Technology
    Now Assist for Configuration Management Database (CMDB)
    Now Assist for ITOM
    Now Assist for IT Service Management (ITSM)
    Now Assist for Security Incident Response
    Now Assist for Strategic Portfolio Management (SPM)
    Customer
    Now Assist for Customer Service Management (CSM)
    Now Assist for Field Service Management (FSM)
    Now Assist for Financial Services Operations (FSO)
    Now Assist for Public Sector Digital Services (PSDS)
    Employee
    Now Assist for Health and Safety
    Incident summarization
    Now Assist for HR Service Delivery (HRSD)
    Now Assist for Legal Service Delivery (LSD)
    Legal request summarization
    Skills for Now Assist in Contract Management:
    Creator
    Now Assist for Creator
    Catalog item generation
    Finance & Supply Chain
    Now Assist for Accounts Payable Operations (APO)
    Record summarization
    Now Assist for Supplier Lifecycle Operations (SLO)
    Supplier case summarization
    Now Assist for Sourcing and Procurement Operations (SPO)
    Record summarization