# Create Guardrail Policies

## Overview

The **Create Guardrail Policies** page allows you to define enforcement rules that evaluate agent requests and responses against security, safety, and compliance criteria.

Each guardrail policy combines metadata, content filters, detection logic, and deployment scope into a single configuration that applies consistently across selected agents and servers.

## Access Guardrail Policies

You can access guardrail policy configuration from the Akto console.

* Navigate to the **Agentic Security** product.
* Select **Agentic Guardrails → Guardrail Policies**.

The guardrail policies list displays existing policies and provides access to policy creation.

## Create a Guardrail Policy

### Access the Create Guardrail Form

1. Locate the **Create Guardrail** button in the top-right corner of the Guardrail Policies page.
2. Select **Create Guardrail** to open the guardrail configuration form.

   <figure><img src="https://3128331180-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ftog5ODwYfqPOf4eQhsOC%2Fuploads%2Fgit-blob-e314f83e3c0bf61ed07e6b07fe9eac1bfcf39521%2Fimage%20(27).png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

### Fill the Configuration Form

The configuration form is organised into nine sections, each targeting a specific enforcement layer.

<details>

<summary>1. Provide Guardrail Policy Details (Mandatory)</summary>

This section defines identifying metadata and user-facing enforcement messages.

* Enter a **policy name** and **description**.
* Select the **severity level** to classify the importance of violations generated by the guardrail policy.
* Define the **blocked message** shown when a request is denied.
* Enable the option to **apply the same blocked message to responses**, if consistent messaging is required across requests and responses.

{% hint style="info" %}
This section is required to create a guardrail policy.
{% endhint %}

<div data-with-frame="true"><figure><img src="https://3128331180-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ftog5ODwYfqPOf4eQhsOC%2Fuploads%2FuO4zS21yomMsppZpckIQ%2Fimage.png?alt=media&#x26;token=ad1e452c-6bbb-48e9-bd4e-f0a920dc310f" alt="" width="563"><figcaption></figcaption></figure></div>

</details>

<details>

<summary>2. Content &#x26; Policy Guardrails</summary>

Configure content moderation and policy-based control:

**Enable Prompt Injection Attack Filter**

* Enable the prompt attack filter to detect jailbreaks and manipulation attempts.
* Adjust detection strength using the horizontal slider..

**Enable Harmful Categories Filter**

* Enable the harmful content filter using the category checkbox.
* Configure enforcement strength for **hate, insults, sexual content, violence, and misconduct** by adjusting the category sliders.
* Enable the option to apply category filtering to **responses**, if required.

**Add Denied Topics**

Denied topics block specific concepts in user inputs or model responses.

* Select **Add Denied Topic** to open the topic configuration form.
* Provide a **topic name**, **definition**, and optional **sample phrases**.
* Use **+ Add** to include additional topics.
* Select **Save Topic** after completing topic configuration.

{% hint style="info" %}
You can add up to **30 denied topics** per guardrail policy.
{% endhint %}

**Agent Intent Verification**

Enable Intent verification to validate whether agent-generated requests align with expected intent.

* Set **Confidence Threshold (0–1)**. Higher values require more confidence to block content.

<div data-with-frame="true"><figure><img src="https://3128331180-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ftog5ODwYfqPOf4eQhsOC%2Fuploads%2FYiEzThtN9hz9DvxI9Wqq%2Fimage.png?alt=media&#x26;token=ca022c81-0879-443a-bd6b-449074a3e290" alt="" width="563"><figcaption></figcaption></figure></div>

</details>

<details>

<summary>3. Language Safety and Abuse Guardrails</summary>

**Gibberish Detection**

Identify nonsensical or meaningless inputs that may disrupt agent processing or indicate misuse.

* Enable gibberish detection and define a confidence threshold.

**Sentiment Analysis**

You can just enable sentiment detection to evaluates inputs for negative, toxic, or inappropriate sentimentand configure a confidence threshold.

**Profanity**

Word filters enforce keyword- and phrase-based restrictions.

* Enable **profanity filtering** to redact offensive language.
* Add **custom words or phrases**:
  * Enter a word or phrase and select **Add Word**.

{% hint style="info" %}
Akto support for up to **10,000 custom entries**.
{% endhint %}

<div data-with-frame="true"><figure><img src="https://3128331180-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ftog5ODwYfqPOf4eQhsOC%2Fuploads%2FuzXDoZl3la0Xr7eKN9Rs%2Fimage.png?alt=media&#x26;token=978fb4f4-f03b-497e-83d2-460111fae004" alt="" width="563"><figcaption></figcaption></figure></div>

</details>

<details>

<summary>4. Sensitive Information Guardrails</summary>

Configure controls to detect, block, or anonymise sensitive data.

**Personally Identifiable Information**

* Select predefined **PII types** to detect and block.
* For each selected PII type, configure the **guardrail behavior**:
  * **Block** to deny the request or response containing the detected PII.
  * **Mask** to redact the detected PII before processing or returning the content.

**Regex-Based Detection**

Akto detects sensitive data based on defined patterns.

* Add up to **10 custom regex patterns** for structured data detection.

**Secrets Detection**

Akto detects API keys, passwords, and similar sensitive values.

* Just enable secrets detection and set the confidence threshold.

**Sensitive Data Anonymisation**

Akto replaces detected sensitive data with placeholders while preserving original values securely for controlled restoration.

<div data-with-frame="true"><figure><img src="https://3128331180-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ftog5ODwYfqPOf4eQhsOC%2Fuploads%2FjbCQ9KD9ctZoNPmS66bz%2Fimage.png?alt=media&#x26;token=177e624c-e9ce-45d1-8a14-7a25d5716e9c" alt="" width="563"><figcaption></figcaption></figure></div>

</details>

<details>

<summary>5. Advanced Code Detection Filters</summary>

You configure detection and blocking of programming code and code injection attempts.

**Code Detection Filter**

Akto detects code patterns across supported programming languages and blocks content based on the configured level.

* Enable the code detection filter and set the **Code Detection Level**.

**Ban Code Detection**

Enable ban code detection to block all code regardless of programming language.

* Set the **Confidence Threshold (0–1)** to control detection strictness.

**Behaviour**

* Lower threshold values enforce stricter blocking.
* Higher threshold values allow more permissive detection.

<div data-with-frame="true"><figure><img src="https://3128331180-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ftog5ODwYfqPOf4eQhsOC%2Fuploads%2F4rALURqmlcLl3X97NtPC%2Fimage.png?alt=media&#x26;token=fbcb702a-ac36-4544-8f04-afda715e19aa" alt="" width="563"><figcaption></figcaption></figure></div>

<br>

</details>

<details>

<summary>6. Custom Guardrails</summary>

**LLM prompt based rule**

This rule uses an LLM to classify content against a custom prompt.

* Define the **evaluation prompt**.
* Configure a **confidence score threshold**.
* Content is blocked when the model confidence exceeds the configured threshold.

**External model based evaluation**

External evaluation allows integration with third-party or internal scoring systems.

* Provide the **external evaluation endpoint URL**.
* Configure a **confidence threshold** that determines enforcement actions based on the external response.

<div data-with-frame="true"><figure><img src="https://3128331180-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ftog5ODwYfqPOf4eQhsOC%2Fuploads%2F7flgjpMbETuRBd3olh73%2Fimage.png?alt=media&#x26;token=bf168be8-af86-4dd6-b244-e5110c472fe1" alt="" width="563"><figcaption></figcaption></figure></div>

</details>

<details>

<summary>7. Usage Based Guardrails</summary>

**Token Limit Detection**

Akto can evaluate input size and detects requests that exceed acceptable token limits.

* Enable token limit detection and set the confidence threshold.

**Behaviour**

* Higher threshold values allow larger inputs.
* Lower threshold values enforce stricter limits and block oversized requests.

<div data-with-frame="true"><figure><img src="https://3128331180-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ftog5ODwYfqPOf4eQhsOC%2Fuploads%2Fj8sNuhuTfW0KHYvFzc4o%2Fimage.png?alt=media&#x26;token=aa2b9d81-b6c1-4f06-b3b7-9e5aba201e49" alt="" width="563"><figcaption></figcaption></figure></div>

</details>

<details>

<summary>8. Anomaly Detection (Coming Soon)</summary>

**Availability Status**

Anomaly detection guardrails are currently under development and not yet configurable.

**Anomaly Categories**

You will be able to define rules across three categories:

* **Statistical Anomalies**: Detect deviations from normal usage patterns and metrics.
* **Behavioral Anomalies**: Identify unexpected agent actions or tool usage patterns.
* **Structural Anomalies**: Detect irregularities in request structure or interaction flows.

**Expected Capability**

Anomaly detection will enable proactive identification of abnormal patterns and potential security risks across agent workflows.

</details>

<details>

<summary>9. Server and Application Settings</summary>

You define where the guardrail policy is enforced and how violations are handled.

**Select Deployment Targets**

Akto enforces the policy only on selected servers. Select MCP servers and agent servers where the guardrail policy should be applied.

**Configure Rule Behaviour**

Choose how Akto responds when a guardrail condition is triggered:

* **Block**: Stops the request or response when the condition is met.
* **Warn**: Notifies the user and allows continuation after acknowledgment.
* **Alert**: Generates an alert for review without blocking content.

**Apply to Requests and Responses**

Enable enforcement based on traffic direction:

* **Apply to Requests**: Evaluates user inputs before processing.
* **Apply to Responses**: Evaluates model outputs before delivery.

You can enable either option independently or both together based on enforcement requirements.

<div data-with-frame="true"><figure><img src="https://3128331180-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ftog5ODwYfqPOf4eQhsOC%2Fuploads%2F0MLzmUYxVceDh4up1zKa%2Fimage.png?alt=media&#x26;token=c51779d9-e336-4a02-84bc-39fb3595875c" alt="" width="563"><figcaption></figcaption></figure></div>

</details>

{% hint style="warning" %}
**Guardrail Deployment Scope Behaviour**

By default, Akto does not apply a newly created guardrail policy to any server or agent server.\
A guardrail policy becomes active only after you explicitly select the target **MCP servers** and **agent servers** in the **Server and Application Settings** section.
{% endhint %}

{% hint style="info" %}

### **OWASP Agentic Risk Tags**

Guardrail policies include OWASP-aligned risk tags. You can click each tag in the UI to understand the associated risk category and its security impact on agent behaviour.
{% endhint %}

### Save the Guardrail Policy

After completing the required and optional configurations:

* Click on **Create Policy** to save the policy and applies enforcement to the selected scope.

  <div data-with-frame="true"><figure><img src="https://3128331180-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ftog5ODwYfqPOf4eQhsOC%2Fuploads%2Fbzo7SEIETJ5KsaeAe6JN%2Fimage.png?alt=media&#x26;token=2a0cb63b-67eb-4235-b8ce-22d63bc39986" alt="" width="563"><figcaption></figcaption></figure></div>

## Test guardrail behaviour in the playground

The playground allows your team to validate guardrail behaviour before updating the policy.

* Enter a prompt in the **Test your guardrail policy** field to simulate a request against the configured guardrail policy.
* The playground evaluates the prompt using the selected guardrail configuration and displays the enforcement result.
* You can also use the **Quick Test Prompts** provided in the playground to test common scenarios such as sensitive data exposure, prompt injection attempts, or abusive language.

<div data-with-frame="true"><figure><img src="https://3128331180-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ftog5ODwYfqPOf4eQhsOC%2Fuploads%2FHd7Io4rvP37DjRV0BX6U%2Fimage.png?alt=media&#x26;token=3d99dafd-3716-4a45-9b73-7f5441b4a3f0" alt="" width="563"><figcaption></figcaption></figure></div>

Playground probing helps your security team verify that guardrail conditions correctly detect violations and return the expected blocked response message before the policy is finalized.

## What’s Next

You can modify, disable, or delete existing guardrail policies after creation.

To continue, learn how to manage guardrail policies from the [manage-guardrail-policies](https://ai-security-docs.akto.io/agentic-guardrails/how-to/manage-guardrail-policies "mention").


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ai-security-docs.akto.io/agentic-guardrails/how-to/create-guardrail-policies.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
