plano/docs/source/llms/llms.rst
Salman Paracha 80c554ce1a
Docs branch - v1 of our tech docs (#69)
* added the first set of docs for our technical docs

* more docuemtnation changes

* added support for prompt processing and updated life of a request

* updated docs to including getting help sections and updated life of a request

* committing local changes for getting started guide, sample applications, and full reference spec for prompt-config

* updated configuration reference, added sample app skeleton, updated favico

* fixed the configuration refernce file, and made minor changes to the intent detection. commit v1 for now

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
Co-authored-by: Adil Hafeez <adil@katanemo.com>
2024-09-20 17:08:42 -07:00

139 lines
No EOL
3 KiB
ReStructuredText
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

.. _llms_in_arch:
LLMs
====
Arch utilizes purpose-built, industry leading, LLMs to handle the crufty and undifferentiated
work around accepting, handling and processing prompts. The following
Arch-Guard
----------
LLM-powered applications are susceptible to prompt attacks, which are prompts intentionally designed to subvert the developers
intended behavior of the LLM.Arch-Guard is a classifier model trained on a large corpus of attacks, capable of detecting explicitly
malicious prompts (and toxicity).
The model is useful as a starting point for identifying and guardrailing against the most risky realistic inputs to
LLM-powered applications. Our goal in embedding Arch-Guard in the Arch gateway is to enable developers to focus on their business logic
and factor out security and safety outside application logic. Wth Arch-Guard= developers can take to significantly reduce prompt attack
risk while maintaining control over the user experience.
Below is our test results of the strength of our model as compared to Prompt-Guard from `Meta LLama <https://huggingface.co/meta-llama/Prompt-Guard-86M>`_.
.. list-table::
:header-rows: 1
:widths: 15 15 10 15 15
* - Dataset
- Jailbreak (Yes/No)
- Samples
- Prompt-Guard Accuracy
- Arch-Guard Accuracy
* - casual_conversation
- 0
- 3725
- 1.00
- 1.00
* - commonqa
- 0
- 9741
- 1.00
- 1.00
* - financeqa
- 0
- 1585
- 1.00
- 1.00
* - instruction
- 0
- 5000
- 1.00
- 1.00
* - jailbreak_behavior_benign
- 0
- 100
- 0.10
- 0.20
* - jailbreak_behavior_harmful
- 1
- 100
- 0.30
- 0.52
* - jailbreak_judge
- 1
- 300
- 0.33
- 0.49
* - jailbreak_prompts
- 1
- 79
- 0.99
- 1.00
* - jailbreak_tweet
- 1
- 1282
- 0.16
- 0.35
* - jailbreak_v
- 1
- 20000
- 0.90
- 0.93
* - jailbreak_vigil
- 1
- 104
- 1.00
- 1.00
* - mental_health
- 0
- 3512
- 1.00
- 1.00
* - telecom
- 0
- 4000
- 1.00
- 1.00
* - truthqa
- 0
- 817
- 1.00
- 0.98
* - weather
- 0
- 3121
- 1.00
- 1.00
.. list-table::
:header-rows: 1
:widths: 15 20
* - Statistics
- Overall performance
* - Overall Accuracy
- 0.93568 (Prompt-Guard), 0.95267 (Arch-Guard)
* - True positives rate (TPR)
- 0.8468 (Prompt-Guard), 0.8887 (Arch-Guard)
* - True negative rate (TNR)
- 0.9972 (Prompt-Guard), 0.9970 (Arch-Guard)
* - False positive rate (FPR)
- 0.0028 (Prompt-Guard), 0.0030 (Arch-Guard)
* - False negative rate (FNR)
- 0.1532 (Prompt-Guard), 0.1113 (Arch-Guard)
.. list-table::
:header-rows: 1
:widths: 15 20
* - Metrics
- Values
* - AUC
- 0.857 (Prompt-Guard), 0.880 (Arch-Guard)
* - Precision
- 0.715 (Prompt-Guard), 0.761 (Arch-Guard)
* - Recall
- 0.999 (Prompt-Guard), 0.999 (Arch-Guard)
Arch-FC1B
---------