For the complete documentation index, see llms.txt. This page is also available as Markdown.

Create Guardrail Policies

Overview

The Create Guardrail Policies page allows you to define enforcement rules that evaluate agent requests and responses against security, safety, and compliance criteria.

Each guardrail policy combines metadata, content filters, detection logic, and deployment scope into a single configuration that applies consistently across selected agents and servers.

Access Guardrail Policies

You can access guardrail policy configuration from the Akto console.

  • Navigate to the Agentic Security product.

  • Select Agentic Guardrails → Guardrail Policies.

The guardrail policies list displays existing policies and provides access to policy creation.

Create a Guardrail Policy

Access the Create Guardrail Form

  1. Locate the Create Guardrail button in the top-right corner of the Guardrail Policies page.

  2. Select Create Guardrail to open the guardrail configuration form.

Fill the Configuration Form

The configuration form is organised into multiple sections, each targeting a specific enforcement layer.

1. Provide Guardrail Policy Details (Mandatory)

This section defines identifying metadata and user-facing enforcement messages.

  • Enter a policy name and description.

  • Select the severity level to classify the importance of violations generated by the guardrail policy.

  • Define the blocked message shown when a request is denied.

  • Enable the option to apply the same blocked message to responses, if consistent messaging is required across requests and responses.

This section is required to create a guardrail policy.

2. Content & Policy Guardrails

Configure content moderation and policy-based control:

Prompt Injection Attack Filter

  • Enable the prompt attack filter to detect jailbreaks and manipulation attempts.

  • Adjust detection strength using the horizontal slider..

Context Poisoning Attacks

  • Enable the context poisoning filter to detect attempts to manipulate agent memory or context.

Add Denied Topics

Denied topics block specific concepts in user inputs or model responses.

  • Select Add Denied Topic to open the topic configuration form.

  • Provide a topic name, definition, and optional sample phrases.

  • Use + Add to include additional topics.

  • Select Save Topic after completing topic configuration.

You can add up to 30 denied topics per guardrail policy.

Harmful Categories Filter

  • Enable the harmful content filter using the category checkbox.

  • Configure enforcement strength for hate, insults, sexual content, violence, and misconduct by adjusting the category sliders.

  • Enable the option to apply category filtering to responses, if required.

Agent Intent Verification

Enable Intent verification to validate whether agent-generated requests align with expected intent.

  • Set Confidence Threshold (0–1). Higher values require more confidence to block content.

3. Language Safety and Abuse Guardrails

Gibberish Detection

Identify nonsensical or meaningless inputs that may disrupt agent processing or indicate misuse.

  • Enable gibberish detection and define a confidence threshold.

Sentiment Detection

You can just enable sentiment detection to evaluates inputs for negative, toxic, or inappropriate sentimentand configure a confidence threshold.

Profanity

Word filters enforce keyword- and phrase-based restrictions.

  • Enable profanity filtering to redact offensive language.

  • Add custom words or phrases:

    • Enter a word or phrase and select Add Word.

Akto support for up to 10,000 custom entries.

4. Sensitive Information Guardrails

Configure controls to detect, block, or anonymise sensitive data.

Personally Identifiable Information Types

  • Select predefined PII types to detect and block.

  • For each selected PII type, configure the guardrail behavior:

    • Block to deny the request or response containing the detected PII.

    • Mask to redact the detected PII before processing or returning the content.

Regex Pattern

Akto detects sensitive data based on defined patterns.

  • Add up to 10 custom regex patterns for structured data detection.

Secrets Detection

Akto detects API keys, passwords, and similar sensitive values.

  • Just enable secrets detection and set the confidence threshold.

Sensitive Data Anonymisation

Akto replaces detected sensitive data with placeholders while preserving original values securely for controlled restoration.

5. Advanced Code Detection Filters

You configure detection and blocking of programming code and code injection attempts.

Code Detection Filter

Akto detects code patterns across supported programming languages and blocks content based on the configured level.

  • Enable the code detection filter and set the Code Detection Level.

Ban Code Detection

Enable ban code detection to block all code regardless of programming language.

  • Set the Confidence Threshold (0–1) to control detection strictness.

Behaviour

  • Lower threshold values enforce stricter blocking.

  • Higher threshold values allow more permissive detection.

6. Custom Guardrails

LLM prompt based rule

This rule uses an LLM to classify content against a custom prompt.

  • Define the evaluation prompt.

  • Configure a confidence score threshold.

  • Content is blocked when the model confidence exceeds the configured threshold.

External model based evaluation

External evaluation allows integration with third-party or internal scoring systems.

  • Provide the external evaluation endpoint URL.

  • Configure a confidence threshold that determines enforcement actions based on the external response.

7. Usage Based Guardrails

Token Limit Detection

Akto can evaluate input size and detects requests that exceed acceptable token limits.

  • Enable token limit detection and set the confidence threshold.

Behaviour

  • Higher threshold values allow larger inputs.

  • Lower threshold values enforce stricter limits and block oversized requests.

8. Anomaly Detection (Coming Soon)

Availability Status

Anomaly detection guardrails are currently under development and not yet configurable.

Anomaly Categories

You will be able to define rules across three categories:

  • Statistical Anomalies: Detect deviations from normal usage patterns and metrics.

  • Behavioral Anomalies: Identify unexpected agent actions or tool usage patterns.

  • Structural Anomalies: Detect irregularities in request structure or interaction flows.

Expected Capability

Anomaly detection will enable proactive identification of abnormal patterns and potential security risks across agent workflows.

9.Tool Guardrails

Tool Misuse

Evaluate tool invocation patterns and blocks suspicious activity.

  • Enable tool misuse detection to identify unauthorized or unsafe tool usage by agents.

Detect Malicious Tool

Block tools that exhibit unsafe or exploitative characteristics.

  • Enable malicious tool detection to identify tools with harmful behavior or intent.

Detect Tool Name and Description Mismatch

Detect cases where tool behavior does not align with declared metada

  • Enable mismatch detection to identify inconsistencies between tool name and description.

10. Access Restrictions

Configure controls to restrict which hosts, paths, and account types can interact with your agents.

Block host / path

Block outbound traffic by host or path pattern to prevent agents from reaching unauthorised external services or endpoints.

  • Enter a host or path pattern in the Host or path pattern field.

  • Select Add to include the pattern in the blocked list.

The Blocked patterns list displays all configured entries. You can review and remove patterns from this list at any time.

Akto sits between your agents and their outbound traffic. When a request is made, Akto checks the destination host or path against the blocked patterns and denies any match before it reaches the target.

Block personal accounts

Prevent users with personal or consumer email accounts from accessing the AI agent. Enterprise accounts using company email domains are allowed through.

  • Enable personal account blocking to restrict access to organisation-managed accounts only.

11. Server and Application Settings

Configure which servers the guardrail should be applied to and specify whether it applies to requests, responses, or both.

Server Targeting

Choose how the guardrail is deployed across your infrastructure:

  • Apply to all: Applies the guardrail to all servers. A count of affected servers is displayed as a link — select it to view the full list.

  • Edit servers: Manually select the specific servers to target. Choose from:

    • MCP Servers: Select the MCP servers where the guardrail should be applied.

    • Agent Servers: Select the agent servers where the guardrail should be applied.

    • Browser LLMs: Select the browser LLMs where the guardrail should be applied.

Rule Behaviour

Choose how Akto responds when a guardrail condition is triggered:

  • Block: Stops the request or response when the condition is met.

  • Warn: Notifies the user and allows continuation after acknowledgment.

  • Alert: Generates an alert for review without blocking content.

Apply to Requests and Responses

Enable enforcement based on traffic direction:

  • Apply to Requests: Evaluates user inputs before processing.

  • Apply to Responses: Evaluates model outputs before delivery.

You can enable either option independently or both together based on enforcement requirements.

OWASP Agentic Risk Tags

Guardrail policies include OWASP-aligned risk tags. You can click each tag in the UI to understand the associated risk category and its security impact on agent behaviour.

Save the Guardrail Policy

After completing the required and optional configurations:

  • Click on Create Policy to save the policy and applies enforcement to the selected scope.

Test guardrail behaviour in the playground

The playground allows your team to validate guardrail behaviour before updating the policy.

  • Enter a prompt in the Test your guardrail policy field to simulate a request against the configured guardrail policy.

  • The playground evaluates the prompt using the selected guardrail configuration and displays the enforcement result.

  • You can also use the Quick Test Prompts provided in the playground to test common scenarios such as sensitive data exposure, prompt injection attempts, or abusive language.

Playground probing helps your security team verify that guardrail conditions correctly detect violations and return the expected blocked response message before the policy is finalized.

What’s Next

You can modify, disable, or delete existing guardrail policies after creation.

To continue, learn how to manage guardrail policies from the Manage Guardrail Policies.

Last updated