mirror of
https://github.com/katanemo/plano.git
synced 2026-06-17 15:25:17 +02:00
89 lines
4 KiB
ReStructuredText
89 lines
4 KiB
ReStructuredText
.. _prompt_guard:
|
|
|
|
Guardrails
|
|
==========
|
|
|
|
**Guardrails** are Plano's way of applying safety and validation checks to prompts before they reach your application logic. They are typically implemented as
|
|
filters in a :ref:`Filter Chain <filter_chain>` attached to an agent, so every request passes through a consistent processing layer.
|
|
|
|
|
|
Why Guardrails
|
|
--------------
|
|
Guardrails are essential for maintaining control over AI-driven applications. They help enforce organizational policies, ensure compliance with regulations
|
|
(like GDPR or HIPAA), and protect users from harmful or inappropriate content. In applications where prompts generate responses or trigger actions, guardrails
|
|
minimize risks like malicious inputs, off-topic queries, or misaligned outputs—adding a consistent layer of input scrutiny that makes interactions safer,
|
|
more reliable, and easier to reason about.
|
|
|
|
|
|
.. vale Vale.Spelling = NO
|
|
|
|
- **Jailbreak Prevention**: Detect and filter inputs that attempt to change LLM behavior, expose system prompts, or bypass safety policies.
|
|
- **Domain and Topicality Enforcement**: Ensure that agents only respond to prompts within an approved domain (for example, finance-only or healthcare-only use cases) and reject unrelated queries.
|
|
- **Dynamic Error Handling**: Provide clear error messages when requests violate policy, helping users correct their inputs.
|
|
|
|
How Guardrails Work
|
|
-------------------
|
|
|
|
In Plano, guardrails are usually implemented as filters that run as HTTP services. Each filter receives the incoming prompt and related metadata, evaluates it
|
|
against policy, and either lets the request continue (HTTP 200) or terminates it early with an appropriate error code (typically HTTP 4xx for policy failures).
|
|
|
|
The example below shows a simple, plain-Python HTTP service that acts as a topicality guardrail: it rejects any prompt that is not related to the
|
|
"weather" domain.
|
|
|
|
.. code-block:: python
|
|
:caption: Example topicality guard filter in plain Python (FastAPI)
|
|
|
|
from fastapi import FastAPI, Request, HTTPException
|
|
|
|
app = FastAPI()
|
|
|
|
ALLOWED_KEYWORDS = {"weather", "forecast", "temperature", "rain", "snow", "humidity"}
|
|
|
|
@app.post("/guardrails/topic")
|
|
async def topic_guard(request: Request):
|
|
body = await request.json()
|
|
# Expecting an OpenAI-style request body with messages
|
|
messages = body.get("messages", [])
|
|
user_content = " ".join(
|
|
m["content"] for m in messages if m.get("role") == "user"
|
|
).lower()
|
|
|
|
if not any(keyword in user_content for keyword in ALLOWED_KEYWORDS):
|
|
# Return 400 to indicate a policy failure (not a server error)
|
|
raise HTTPException(
|
|
status_code=400,
|
|
detail={
|
|
"error": "off_topic",
|
|
"message": "This assistant only answers weather-related questions.",
|
|
},
|
|
)
|
|
|
|
# If the prompt is on-topic, just pass the original body through
|
|
return body
|
|
|
|
|
|
To wire this guardrail into Plano, you define a listener of ``type: agent`` and attach a filter chain with a single filter that points
|
|
to the Python service above.
|
|
|
|
.. code-block:: yaml
|
|
:caption: Listener (type: agent) with a topicality guard filter
|
|
|
|
filters:
|
|
- id: topicality_guard
|
|
url: http://topic-guard:8000/guardrails/topic
|
|
|
|
listeners:
|
|
- type: agent
|
|
name: agent_listener
|
|
port: 8001
|
|
router: arch_agent_router
|
|
agents:
|
|
- id: rag_agent
|
|
description: virtual assistant for retrieval augmented generation tasks
|
|
filter_chain:
|
|
- topicality_guard
|
|
|
|
|
|
When a request arrives at ``agent_listener``, Plano will first call the ``topicality_guard`` filter. If the filter returns **HTTP 200**,
|
|
the request continues on to the configured agent or prompt target. If the filter returns **HTTP 400**, Plano returns that error back to
|
|
the caller and does not forward the request further—enforcing your domain guardrail without changing any application code.
|