Cotran/prompt guard doc (#147)

* repalce prompt injection with jailbreak and removing toxc * repalce prompt injection with jailbreak and removing toxc
2026-05-03 04:42:49 +02:00 · 2024-10-08 15:58:50 -07:00 · 2024-10-08 15:58:50 -07:00 · 22bc3d2798
commit 22bc3d2798
parent fab71abdac
1 changed files with 2 additions and 4 deletions
--- a/docs/source/guides/prompt_guard.rst
+++ b/docs/source/guides/prompt_guard.rst
@ -17,12 +17,10 @@ Why Prompt Guard
    - **Value Constraints**: Restricts inputs to valid ranges, lengths, or patterns to avoid unusual or incorrect responses.

 - **Prompt Sanitization**
-    - **Injection Prevention**: Detects and filters inputs that might attempt injection attacks, like adding code or SQL queries in a prompt-based application.
-    - **Content Filtering**: Identifies and removes potentially harmful, sensitive, or inappropriate content from inputs to maintain safe interactions.
+    - **Jailbreak Prevention**: Detects and filters inputs that might attempt jailbreak attacks, like alternating LLM intended behavior, exposing the system prompt, or bypassing ethnics safety.

 - **Intent Detection**
    - **Behavioral Analysis**: Analyzes prompt intent to detect if the input aligns with the function’s intended use. This can help prevent unwanted behavior, such as attempts to bypass limitations or misuse system functions.
-    - **Sentiment and Tone Checking**: Examines the tone of prompts to ensure they align with application guidelines, useful in conversational systems and customer support interactions.

 - **Dynamic Error Handling**
    - **Automatic Correction**: Applies error-handling techniques to suggest corrections for minor input errors, such as typos or misformatted data.
@ -42,7 +40,7 @@ Arch-Guard is designed to address this challenge.
 What Is Arch-Guard
 ~~~~~~~~~~~~~~~~~~
 `Arch-Guard <https://huggingface.co/collections/katanemolabs/arch-guard-6702bdc08b889e4bce8f446d>`_ is a robust classifier model specifically trained on a diverse corpus of prompt attacks.
-It excels at detecting explicitly malicious prompts and assessing toxic content, providing an essential layer of security for LLM applications.
+It excels at detecting explicitly malicious prompts, providing an essential layer of security for LLM applications.

 By embedding Arch-Guard within the Arch architecture, we empower developers to build robust, LLM-powered applications while prioritizing security and safety. With Arch-Guard, you can navigate the complexities of prompt management with confidence, knowing you have a reliable defense against malicious input.