Create Guardrail Policies

Overview

The Create Guardrail Policies page allows you to define enforcement rules that evaluate agent requests and responses against security, safety, and compliance criteria.

Each guardrail policy combines metadata, content filters, detection logic, and deployment scope into a single configuration that applies consistently across selected agents and servers.

Access Guardrail Policies

You can access guardrail policy configuration from the Akto console.

Navigate to the Agentic Security product.
Select Agentic Guardrails → Guardrail Policies.

The guardrail policies list displays existing policies and provides access to policy creation.

Create a Guardrail Policy

Access the Create Guardrail Form

Locate the Create Guardrail button in the top-right corner of the Guardrail Policies page.
Select Create Guardrail to open the guardrail configuration form.

Fill the Configuration Form

The configuration form is organised into nine sections, each targeting a specific enforcement layer.

1. Provide Guardrail Policy Details (Mandatory)

This section defines identifying metadata and user-facing enforcement messages.

Enter a policy name and description.
Define the blocked message shown when a request is denied.
Enable the option to apply the same blocked message to responses, if consistent messaging is required across requests and responses.

This section is required to create a guardrail policy.

2. Configure Content Filters

Content filters control category-based and attack-based detection.

Harmful Content Categories

Enable the harmful content filter using the category checkbox.
Configure enforcement strength for hate, insults, sexual content, violence, and misconduct by adjusting the category sliders.
Enable the option to apply category filtering to responses, if required.

Prompt Attack Detection

Enable the prompt attack filter to detect jailbreaks and manipulation attempts.
Adjust detection strength using the horizontal slider.

3. Add Denied Topics

Denied topics block specific concepts in user inputs or model responses.

Select Add Denied Topic to open the topic configuration form.
Provide a topic name, definition, and optional sample phrases.
Use + Add to include additional topics.
Select Save Topic after completing topic configuration.

You can add up to 30 denied topics per guardrail policy.

4. Add Word Filters

Word filters enforce keyword- and phrase-based restrictions.

Enable profanity filtering to redact offensive language.
Add custom words or phrases:
- Enter a word or phrase and select Add Word.

Akto support for up to 10,000 custom entries.

5. Add Sensitive Information Filters

Sensitive information filters prevent exposure of regulated data.

Personally Identifiable Information
- Select predefined PII types to detect and block.
- For each selected PII type, configure the guardrail behavior:
  - Block to deny the request or response containing the detected PII.
  - Mask to redact the detected PII before processing or returning the content.
Regex-Based Detection
- Add up to 10 custom regex patterns for structured data detection.

6. LLM Prompt-Based Rule

This rule uses an LLM to classify content against a custom prompt.

Define the evaluation prompt.
Configure a confidence score threshold.
Content is blocked when the model confidence exceeds the configured threshold.

7. Intent Verification Using Base Prompt (AI Agents)

Intent verification validates agent behavior against a defined base prompt.

The evaluation checks whether agent requests align with the expected intent of the base prompt.
Requests that deviate from the defined intent are flagged or blocked based on policy behaviour.

8. External Model-Based Evaluation

External evaluation allows integration with third-party or internal scoring systems.

Provide the external evaluation endpoint URL.
Configure a confidence threshold that determines enforcement actions based on the external response.

9. Server and Application Settings

This section defines where the guardrail policy is enforced.

Select the MCP servers and agent servers where the policy applies.
Enable enforcement for requests, responses, or both.

Guardrail Deployment Scope Behaviour

By default, Akto does not apply a newly created guardrail policy to any server or agent server. A guardrail policy becomes active only after you explicitly select the target MCP servers and agent servers in the Server and Application Settings section.

Save the Guardrail Policy

After completing the required and optional configurations:

Click on Create Guardrail to save the policy and applies enforcement to the selected scope.

What’s Next

You can modify, disable, or delete existing guardrail policies after creation.

To continue, learn how to manage guardrail policies from the Manage Guardrail Policies.

PreviousHow To NextManage Guardrail Policies

Last updated 17 days ago

hashtagOverview

hashtagAccess Guardrail Policies

hashtagCreate a Guardrail Policy

hashtagAccess the Create Guardrail Form

hashtagFill the Configuration Form

hashtagSave the Guardrail Policy

hashtagWhat’s Next