plano/docs/source/guides/prompt_guard.rst

.. _prompt_guard:

Prompt Guard
=============

**Prompt guard** is a security and validation feature offered in Arch to protect agents, by filtering and analyzing prompts before they reach your application logic.
In applications where prompts generate responses or execute specific actions based on user inputs, prompt guard minimizes risks like malicious inputs (or misaligned outputs).
By adding a layer of input scrutiny, prompt guards ensures safer, more reliable, and accurate interactions with agents.

Why Prompt Guard
----------------

.. vale Vale.Spelling = NO

- **Prompt Sanitization via Arch-Guard**
    - **Jailbreak Prevention**: Detects and filters inputs that might attempt jailbreak attacks, like alternating LLM intended behavior, exposing the system prompt, or bypassing ethnics safety.

- **Dynamic Error Handling**
    - **Automatic Correction**: Applies error-handling techniques to suggest corrections for minor input errors, such as typos or misformatted data.
    - **Feedback Mechanism**: Provides informative error messages to users, helping them understand how to correct input mistakes or adhere to guidelines.

.. Note::
    Today, Arch offers support for jailbreak via Arch-Guard. We will be adding support for additional guards in Q1, 2025 (including response guardrails)

What Is Arch-Guard
~~~~~~~~~~~~~~~~~~
`Arch-Guard <https://huggingface.co/collections/katanemo/arch-guard-6702bdc08b889e4bce8f446d>`_ is a robust classifier model specifically trained on a diverse corpus of prompt attacks.
It excels at detecting explicitly malicious prompts, providing an essential layer of security for LLM applications.

By embedding Arch-Guard within the Arch architecture, we empower developers to build robust, LLM-powered applications while prioritizing security and safety. With Arch-Guard, you can navigate the complexities of prompt management with confidence, knowing you have a reliable defense against malicious input.


Example Configuration
~~~~~~~~~~~~~~~~~~~~~
Here is an example of using Arch-Guard in Arch:

.. literalinclude:: includes/arch_config.yaml
    :language: yaml
    :linenos:
    :lines: 22-26
    :caption: Arch-Guard Example Configuration

How Arch-Guard Works
----------------------

#. **Pre-Processing Stage**

    As a request or prompt is received, Arch Guard first performs validation. If any violations are detected, the input is flagged, and a tailored error message may be returned.

#. **Error Handling and Feedback**

    If the prompt contains errors or does not meet certain criteria, the user receives immediate feedback or correction suggestions, enhancing usability and reducing the chance of repeated input mistakes.

Benefits of Using Arch Guard
------------------------------

- **Enhanced Security**: Protects against injection attacks, harmful content, and misuse, securing both system and user data.

- **Better User Experience**: Clear feedback and error correction improve user interactions by guiding them to correct input formats and constraints.


Summary
-------

Prompt guard is an essential tool for any prompt-based system that values security, accuracy, and compliance.
By implementing Prompt Guard, developers can provide a robust layer of input validation and security, leading to better-performing, reliable, and safer applications.
Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00			`.. _prompt_guard:`

			`Prompt Guard`
updating doc versions, images and cleaning up section for prompt-guard (#320) * updating doc versions, images and cleaning up section for prompt-guard * updating based on feedback --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-12-01 23:02:08 -08:00			`=============`
Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00
updating doc versions, images and cleaning up section for prompt-guard (#320) * updating doc versions, images and cleaning up section for prompt-guard * updating based on feedback --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-12-01 23:02:08 -08:00			`Prompt guard is a security and validation feature offered in Arch to protect agents, by filtering and analyzing prompts before they reach your application logic.`
			`In applications where prompts generate responses or execute specific actions based on user inputs, prompt guard minimizes risks like malicious inputs (or misaligned outputs).`
			`By adding a layer of input scrutiny, prompt guards ensures safer, more reliable, and accurate interactions with agents.`
Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00
			`Why Prompt Guard`
			`----------------`

			`.. vale Vale.Spelling = NO`

updating doc versions, images and cleaning up section for prompt-guard (#320) * updating doc versions, images and cleaning up section for prompt-guard * updating based on feedback --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-12-01 23:02:08 -08:00			`- Prompt Sanitization via Arch-Guard`
Cotran/prompt guard doc (#147) * repalce prompt injection with jailbreak and removing toxc * repalce prompt injection with jailbreak and removing toxc 2024-10-08 15:58:50 -07:00			`- Jailbreak Prevention: Detects and filters inputs that might attempt jailbreak attacks, like alternating LLM intended behavior, exposing the system prompt, or bypassing ethnics safety.`
Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00
			`- Dynamic Error Handling`
			`- Automatic Correction: Applies error-handling techniques to suggest corrections for minor input errors, such as typos or misformatted data.`
			`- Feedback Mechanism: Provides informative error messages to users, helping them understand how to correct input mistakes or adhere to guidelines.`

updating doc versions, images and cleaning up section for prompt-guard (#320) * updating doc versions, images and cleaning up section for prompt-guard * updating based on feedback --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-12-01 23:02:08 -08:00			`.. Note::`
			`Today, Arch offers support for jailbreak via Arch-Guard. We will be adding support for additional guards in Q1, 2025 (including response guardrails)`
Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00
			`What Is Arch-Guard`
			`~~~~~~~~~~~~~~~~~~`
Update doc (#178) * Update doc * Update links 2024-10-10 22:30:54 -07:00			`Arch-Guard <https://huggingface.co/collections/katanemo/arch-guard-6702bdc08b889e4bce8f446d>`_ is a robust classifier model specifically trained on a diverse corpus of prompt attacks.
Cotran/prompt guard doc (#147) * repalce prompt injection with jailbreak and removing toxc * repalce prompt injection with jailbreak and removing toxc 2024-10-08 15:58:50 -07:00			`It excels at detecting explicitly malicious prompts, providing an essential layer of security for LLM applications.`
Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00
			`By embedding Arch-Guard within the Arch architecture, we empower developers to build robust, LLM-powered applications while prioritizing security and safety. With Arch-Guard, you can navigate the complexities of prompt management with confidence, knowing you have a reliable defense against malicious input.`


Fix errors and improve Doc (#143) * Fix link issues and add icons * Improve Doc * fix test * making minor modifications to shuguangs' doc changes --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> Co-authored-by: Adil Hafeez <adil@katanemo.com> 2024-10-08 13:18:34 -07:00			`Example Configuration`
			`~~~~~~~~~~~~~~~~~~~~~`
			`Here is an example of using Arch-Guard in Arch:`

			`.. literalinclude:: includes/arch_config.yaml`
			`:language: yaml`
			`:linenos:`
			`:lines: 22-26`
			`:caption: Arch-Guard Example Configuration`

Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00			`How Arch-Guard Works`
			`----------------------`

			`#. Pre-Processing Stage`

updating doc versions, images and cleaning up section for prompt-guard (#320) * updating doc versions, images and cleaning up section for prompt-guard * updating based on feedback --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-12-01 23:02:08 -08:00			`As a request or prompt is received, Arch Guard first performs validation. If any violations are detected, the input is flagged, and a tailored error message may be returned.`
Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00
			`#. Error Handling and Feedback`

			`If the prompt contains errors or does not meet certain criteria, the user receives immediate feedback or correction suggestions, enhancing usability and reducing the chance of repeated input mistakes.`

updating doc versions, images and cleaning up section for prompt-guard (#320) * updating doc versions, images and cleaning up section for prompt-guard * updating based on feedback --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-12-01 23:02:08 -08:00			`Benefits of Using Arch Guard`
Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00			`------------------------------`

			`- Enhanced Security: Protects against injection attacks, harmful content, and misuse, securing both system and user data.`

			`- Better User Experience: Clear feedback and error correction improve user interactions by guiding them to correct input formats and constraints.`


			`Summary`
			`-------`

			`Prompt guard is an essential tool for any prompt-based system that values security, accuracy, and compliance.`
			`By implementing Prompt Guard, developers can provide a robust layer of input validation and security, leading to better-performing, reliable, and safer applications.`