Create Guardrail Policies
Overview
The Create Guardrail Policies page allows you to define enforcement rules that evaluate agent requests and responses against security, safety, and compliance criteria.
Each guardrail policy combines metadata, content filters, detection logic, and deployment scope into a single configuration that applies consistently across selected agents and servers.
Access Guardrail Policies
You can access guardrail policy configuration from the Akto console.
Navigate to the Agentic Security product.
Select Agentic Guardrails → Guardrail Policies.
The guardrail policies list displays existing policies and provides access to policy creation.
Create a Guardrail Policy
Access the Create Guardrail Form
Locate the Create Guardrail button in the top-right corner of the Guardrail Policies page.
Select Create Guardrail to open the guardrail configuration form.

Fill the Configuration Form
The configuration form is organised into multiple sections, each targeting a specific enforcement layer.
1. Provide Guardrail Policy Details (Mandatory)
This section defines identifying metadata and user-facing enforcement messages.
Enter a policy name and description.
Select the severity level to classify the importance of violations generated by the guardrail policy.
Define the blocked message shown when a request is denied.
Enable the option to apply the same blocked message to responses, if consistent messaging is required across requests and responses.
This section is required to create a guardrail policy.

2. Content & Policy Guardrails
Configure content moderation and policy-based control:
Prompt Injection Attack Filter
Enable the prompt attack filter to detect jailbreaks and manipulation attempts.
Adjust detection strength using the horizontal slider..
Context Poisoning Attacks
Enable the context poisoning filter to detect attempts to manipulate agent memory or context.
Add Denied Topics
Denied topics block specific concepts in user inputs or model responses.
Select Add Denied Topic to open the topic configuration form.
Provide a topic name, definition, and optional sample phrases.
Use + Add to include additional topics.
Select Save Topic after completing topic configuration.
You can add up to 30 denied topics per guardrail policy.
Harmful Categories Filter
Enable the harmful content filter using the category checkbox.
Configure enforcement strength for hate, insults, sexual content, violence, and misconduct by adjusting the category sliders.
Enable the option to apply category filtering to responses, if required.
Agent Intent Verification
Enable Intent verification to validate whether agent-generated requests align with expected intent.
Set Confidence Threshold (0–1). Higher values require more confidence to block content.

3. Language Safety and Abuse Guardrails
Gibberish Detection
Identify nonsensical or meaningless inputs that may disrupt agent processing or indicate misuse.
Enable gibberish detection and define a confidence threshold.
Sentiment Detection
You can just enable sentiment detection to evaluates inputs for negative, toxic, or inappropriate sentimentand configure a confidence threshold.
Profanity
Word filters enforce keyword- and phrase-based restrictions.
Enable profanity filtering to redact offensive language.
Add custom words or phrases:
Enter a word or phrase and select Add Word.
Akto support for up to 10,000 custom entries.

4. Sensitive Information Guardrails
Configure controls to detect, block, or anonymise sensitive data.
Personally Identifiable Information Types
Select predefined PII types to detect and block.
For each selected PII type, configure the guardrail behavior:
Block to deny the request or response containing the detected PII.
Mask to redact the detected PII before processing or returning the content.
Regex Pattern
Akto detects sensitive data based on defined patterns.
Add up to 10 custom regex patterns for structured data detection.
Secrets Detection
Akto detects API keys, passwords, and similar sensitive values.
Just enable secrets detection and set the confidence threshold.
Sensitive Data Anonymisation
Akto replaces detected sensitive data with placeholders while preserving original values securely for controlled restoration.

5. Advanced Code Detection Filters
You configure detection and blocking of programming code and code injection attempts.
Code Detection Filter
Akto detects code patterns across supported programming languages and blocks content based on the configured level.
Enable the code detection filter and set the Code Detection Level.
Ban Code Detection
Enable ban code detection to block all code regardless of programming language.
Set the Confidence Threshold (0–1) to control detection strictness.
Behaviour
Lower threshold values enforce stricter blocking.
Higher threshold values allow more permissive detection.

6. Custom Guardrails
LLM prompt based rule
This rule uses an LLM to classify content against a custom prompt.
Define the evaluation prompt.
Configure a confidence score threshold.
Content is blocked when the model confidence exceeds the configured threshold.
External model based evaluation
External evaluation allows integration with third-party or internal scoring systems.
Provide the external evaluation endpoint URL.
Configure a confidence threshold that determines enforcement actions based on the external response.

7. Usage Based Guardrails
Token Limit Detection
Akto can evaluate input size and detects requests that exceed acceptable token limits.
Enable token limit detection and set the confidence threshold.
Behaviour
Higher threshold values allow larger inputs.
Lower threshold values enforce stricter limits and block oversized requests.

8. Anomaly Detection (Coming Soon)
Availability Status
Anomaly detection guardrails are currently under development and not yet configurable.
Anomaly Categories
You will be able to define rules across three categories:
Statistical Anomalies: Detect deviations from normal usage patterns and metrics.
Behavioral Anomalies: Identify unexpected agent actions or tool usage patterns.
Structural Anomalies: Detect irregularities in request structure or interaction flows.
Expected Capability
Anomaly detection will enable proactive identification of abnormal patterns and potential security risks across agent workflows.
9.Tool Guardrails
Tool Misuse
Evaluate tool invocation patterns and blocks suspicious activity.
Enable tool misuse detection to identify unauthorized or unsafe tool usage by agents.
Detect Malicious Tool
Block tools that exhibit unsafe or exploitative characteristics.
Enable malicious tool detection to identify tools with harmful behavior or intent.
Detect Tool Name and Description Mismatch
Detect cases where tool behavior does not align with declared metada
Enable mismatch detection to identify inconsistencies between tool name and description.

10. Access Restrictions
Configure controls to restrict which hosts, paths, and account types can interact with your agents.
Block host / path
Block outbound traffic by host or path pattern to prevent agents from reaching unauthorised external services or endpoints.
Enter a host or path pattern in the Host or path pattern field.
Select Add to include the pattern in the blocked list.
The Blocked patterns list displays all configured entries. You can review and remove patterns from this list at any time.
Akto sits between your agents and their outbound traffic. When a request is made, Akto checks the destination host or path against the blocked patterns and denies any match before it reaches the target.
Block personal accounts
Prevent users with personal or consumer email accounts from accessing the AI agent. Enterprise accounts using company email domains are allowed through.
Enable personal account blocking to restrict access to organisation-managed accounts only.

11. Server and Application Settings
Configure which servers the guardrail should be applied to and specify whether it applies to requests, responses, or both.
Server Targeting
Choose how the guardrail is deployed across your infrastructure:
Apply to all: Applies the guardrail to all servers. A count of affected servers is displayed as a link — select it to view the full list.
Edit servers: Manually select the specific servers to target. Choose from:
MCP Servers: Select the MCP servers where the guardrail should be applied.
Agent Servers: Select the agent servers where the guardrail should be applied.
Browser LLMs: Select the browser LLMs where the guardrail should be applied.
Rule Behaviour
Choose how Akto responds when a guardrail condition is triggered:
Block: Stops the request or response when the condition is met.
Warn: Notifies the user and allows continuation after acknowledgment.
Alert: Generates an alert for review without blocking content.
Apply to Requests and Responses
Enable enforcement based on traffic direction:
Apply to Requests: Evaluates user inputs before processing.
Apply to Responses: Evaluates model outputs before delivery.
You can enable either option independently or both together based on enforcement requirements.

Guardrail Deployment Scope Behaviour
By default, Akto does not apply a newly created guardrail policy to any server or agent server. A guardrail policy becomes active only after you explicitly select the target MCP servers and agent servers in the Server and Application Settings section.
OWASP Agentic Risk Tags
Guardrail policies include OWASP-aligned risk tags. You can click each tag in the UI to understand the associated risk category and its security impact on agent behaviour.
Save the Guardrail Policy
After completing the required and optional configurations:
Click on Create Policy to save the policy and applies enforcement to the selected scope.
Test guardrail behaviour in the playground
The playground allows your team to validate guardrail behaviour before updating the policy.
Enter a prompt in the Test your guardrail policy field to simulate a request against the configured guardrail policy.
The playground evaluates the prompt using the selected guardrail configuration and displays the enforcement result.
You can also use the Quick Test Prompts provided in the playground to test common scenarios such as sensitive data exposure, prompt injection attempts, or abusive language.

Playground probing helps your security team verify that guardrail conditions correctly detect violations and return the expected blocked response message before the policy is finalized.
What’s Next
You can modify, disable, or delete existing guardrail policies after creation.
To continue, learn how to manage guardrail policies from the Manage Guardrail Policies.
Last updated