mirror of
https://github.com/katanemo/plano.git
synced 2026-05-03 04:42:49 +02:00
Cotran/prompt guard doc (#147)
* repalce prompt injection with jailbreak and removing toxc * repalce prompt injection with jailbreak and removing toxc
This commit is contained in:
parent
fab71abdac
commit
22bc3d2798
1 changed files with 2 additions and 4 deletions
|
|
@ -17,12 +17,10 @@ Why Prompt Guard
|
|||
- **Value Constraints**: Restricts inputs to valid ranges, lengths, or patterns to avoid unusual or incorrect responses.
|
||||
|
||||
- **Prompt Sanitization**
|
||||
- **Injection Prevention**: Detects and filters inputs that might attempt injection attacks, like adding code or SQL queries in a prompt-based application.
|
||||
- **Content Filtering**: Identifies and removes potentially harmful, sensitive, or inappropriate content from inputs to maintain safe interactions.
|
||||
- **Jailbreak Prevention**: Detects and filters inputs that might attempt jailbreak attacks, like alternating LLM intended behavior, exposing the system prompt, or bypassing ethnics safety.
|
||||
|
||||
- **Intent Detection**
|
||||
- **Behavioral Analysis**: Analyzes prompt intent to detect if the input aligns with the function’s intended use. This can help prevent unwanted behavior, such as attempts to bypass limitations or misuse system functions.
|
||||
- **Sentiment and Tone Checking**: Examines the tone of prompts to ensure they align with application guidelines, useful in conversational systems and customer support interactions.
|
||||
|
||||
- **Dynamic Error Handling**
|
||||
- **Automatic Correction**: Applies error-handling techniques to suggest corrections for minor input errors, such as typos or misformatted data.
|
||||
|
|
@ -42,7 +40,7 @@ Arch-Guard is designed to address this challenge.
|
|||
What Is Arch-Guard
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
`Arch-Guard <https://huggingface.co/collections/katanemolabs/arch-guard-6702bdc08b889e4bce8f446d>`_ is a robust classifier model specifically trained on a diverse corpus of prompt attacks.
|
||||
It excels at detecting explicitly malicious prompts and assessing toxic content, providing an essential layer of security for LLM applications.
|
||||
It excels at detecting explicitly malicious prompts, providing an essential layer of security for LLM applications.
|
||||
|
||||
By embedding Arch-Guard within the Arch architecture, we empower developers to build robust, LLM-powered applications while prioritizing security and safety. With Arch-Guard, you can navigate the complexities of prompt management with confidence, knowing you have a reliable defense against malicious input.
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue