Hugging Face
Overview
This guide explains how to integrate Akto AI Agent Proxy with a Hugging Face Private Inference Endpoint used by customers to run private LLM inference. The proxy sits between the end user and the agent application (Option B) to monitor, enforce guardrails, and log model invocation traffic without modifying internal client code.
Akto AI Agent Proxy provides:
Guardrail enforcement on both requests and responses
Sensitive data redaction
Security threat detection
Hugging Face’s Private Inference Endpoint provides a dedicated, managed model endpoint accessible only via AWS PrivateLink from within a VPC. Hugging Face does not automatically log full prompt & response conversations like AWS Bedrock, so Akto must capture this upstream.
Prerequisites
Before integrating Akto Proxy:
A working Hugging Face Private Inference Endpoint configured with PrivateLink.
AWS VPC where the endpoint service is reachable.
The AI agent and Akto Proxy deployed in the same VPC or with network access to the PrivateLink interfaces.
Access credentials for Hugging Face inference (API token).
Architecture Diagram
End user calls the AI agent API.
Akto Proxy intercepts requests (guardrail enforcement).
Proxy forwards to HF Private Inference Endpoint (via PrivateLink).
Responses pass back through Akto Proxy.
Akto logs, analyzes and optionally redacts or blocks results.
Setup Steps
Configure Hugging Face Private Inference Endpoint
Ensure the endpoint is set up with:
Model deployed
PrivateLink enabled
Correct AWS account and region
VPC interface endpoint created in your VPC
Hugging Face does not log full request/response content by itself. You must capture it upstream.
Deploy Akto AI Agent Proxy
Deploy the proxy in the same VPC where:
End user traffic enters
The AI agent application runs
The PrivateLink interface to HF endpoint exists
Configure Proxy Environment
Here is an example config for the proxy:
AKTO_API_TOKEN: Akto ingestion token (from Akto dashboard)AKTO_API_BASE_URL: Akto proxy ingestion serverAPP_URL: Upstream target (the HF Private Inference Endpoint URL)LOG_LEVEL: Logging verbosity
Adjust Endpoint URL in Agent App
Update the AI agent’s inference call configuration:
Set model base URL to the Akto Proxy endpoint
Pass Hugging Face authentication headers through proxy
For example:
This ensures:
Traffic flows through Akto Proxy
Akto captures all inference calls
Validate Integration
Verify end-to-end flow:
Send an inference request from the user
Akto Proxy receives and logs the call
Proxy enforces any guardrails
Proxy forwards to HF Private Endpoint
Response returns through Akto Proxy
Logs appear in Akto dashboard
Look for:
Request/response pairs in proxy logs
Guardrail hits (if configured)
Redaction results
Security & Guardrails
Akto Proxy supports:
Request guardrails (input sanitization)
Response guardrails (filtering outputs)
Redaction of sensitive tokens or PII
Rate limiting and anomaly detection
Use our policy packs or define custom rules based on:
Content patterns
Risk categories
Endpoint sensitivity
Logging & Monitoring
Hugging Face Private Endpoints offer:
Operational logs (status, errors)
Metrics (latency, throughput)
They do not log conversation content by default.
Akto Proxy will log:
Full request and response traces
Guardrail decision events
Alerts and incidents
Metadata for analytics
Troubleshooting
Proxy cannot reach HF Endpoint: Check PrivateLink and VPC routing.
Auth failures: Verify Hugging Face API token headers are passed by proxy.
No logs in Akto: Confirm AKTO_API_TOKEN and ingestion config.
Guardrail not triggering: Validate rule pack configuration.
Summary
By integrating Akto AI Agent Proxy in front of a Hugging Face Private Inference Endpoint:
You achieve guardrail enforcement without modifying the client code
You capture and monitor model invocation traffic
You gain observability of conversation logging
Akto Proxy becomes the enforcement and observability layer for private HF model usage.
Last updated