Cotran/prompt guard doc (#147)

* repalce prompt injection with jailbreak and removing toxc

* repalce prompt injection with jailbreak and removing toxc
This commit is contained in:
Co Tran 2024-10-08 15:58:50 -07:00 committed by GitHub
parent fab71abdac
commit 22bc3d2798
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -17,12 +17,10 @@ Why Prompt Guard
- **Value Constraints**: Restricts inputs to valid ranges, lengths, or patterns to avoid unusual or incorrect responses.
- **Prompt Sanitization**
- **Injection Prevention**: Detects and filters inputs that might attempt injection attacks, like adding code or SQL queries in a prompt-based application.
- **Content Filtering**: Identifies and removes potentially harmful, sensitive, or inappropriate content from inputs to maintain safe interactions.
- **Jailbreak Prevention**: Detects and filters inputs that might attempt jailbreak attacks, like alternating LLM intended behavior, exposing the system prompt, or bypassing ethnics safety.
- **Intent Detection**
- **Behavioral Analysis**: Analyzes prompt intent to detect if the input aligns with the functions intended use. This can help prevent unwanted behavior, such as attempts to bypass limitations or misuse system functions.
- **Sentiment and Tone Checking**: Examines the tone of prompts to ensure they align with application guidelines, useful in conversational systems and customer support interactions.
- **Dynamic Error Handling**
- **Automatic Correction**: Applies error-handling techniques to suggest corrections for minor input errors, such as typos or misformatted data.
@ -42,7 +40,7 @@ Arch-Guard is designed to address this challenge.
What Is Arch-Guard
~~~~~~~~~~~~~~~~~~
`Arch-Guard <https://huggingface.co/collections/katanemolabs/arch-guard-6702bdc08b889e4bce8f446d>`_ is a robust classifier model specifically trained on a diverse corpus of prompt attacks.
It excels at detecting explicitly malicious prompts and assessing toxic content, providing an essential layer of security for LLM applications.
It excels at detecting explicitly malicious prompts, providing an essential layer of security for LLM applications.
By embedding Arch-Guard within the Arch architecture, we empower developers to build robust, LLM-powered applications while prioritizing security and safety. With Arch-Guard, you can navigate the complexities of prompt management with confidence, knowing you have a reliable defense against malicious input.