mirror of
https://github.com/katanemo/plano.git
synced 2026-05-01 11:56:29 +02:00
Docs branch - v1 of our tech docs (#69)
* added the first set of docs for our technical docs * more docuemtnation changes * added support for prompt processing and updated life of a request * updated docs to including getting help sections and updated life of a request * committing local changes for getting started guide, sample applications, and full reference spec for prompt-config * updated configuration reference, added sample app skeleton, updated favico * fixed the configuration refernce file, and made minor changes to the intent detection. commit v1 for now --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> Co-authored-by: Adil Hafeez <adil@katanemo.com>
This commit is contained in:
parent
233976a568
commit
80c554ce1a
34 changed files with 1040 additions and 0 deletions
139
docs/source/llms/llms.rst
Normal file
139
docs/source/llms/llms.rst
Normal file
|
|
@ -0,0 +1,139 @@
|
|||
.. _llms_in_arch:
|
||||
|
||||
LLMs
|
||||
====
|
||||
Arch utilizes purpose-built, industry leading, LLMs to handle the crufty and undifferentiated
|
||||
work around accepting, handling and processing prompts. The following
|
||||
|
||||
Arch-Guard
|
||||
----------
|
||||
LLM-powered applications are susceptible to prompt attacks, which are prompts intentionally designed to subvert the developer’s
|
||||
intended behavior of the LLM.Arch-Guard is a classifier model trained on a large corpus of attacks, capable of detecting explicitly
|
||||
malicious prompts (and toxicity).
|
||||
|
||||
The model is useful as a starting point for identifying and guardrailing against the most risky realistic inputs to
|
||||
LLM-powered applications. Our goal in embedding Arch-Guard in the Arch gateway is to enable developers to focus on their business logic
|
||||
and factor out security and safety outside application logic. Wth Arch-Guard= developers can take to significantly reduce prompt attack
|
||||
risk while maintaining control over the user experience.
|
||||
|
||||
Below is our test results of the strength of our model as compared to Prompt-Guard from `Meta LLama <https://huggingface.co/meta-llama/Prompt-Guard-86M>`_.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 15 15 10 15 15
|
||||
|
||||
* - Dataset
|
||||
- Jailbreak (Yes/No)
|
||||
- Samples
|
||||
- Prompt-Guard Accuracy
|
||||
- Arch-Guard Accuracy
|
||||
* - casual_conversation
|
||||
- 0
|
||||
- 3725
|
||||
- 1.00
|
||||
- 1.00
|
||||
* - commonqa
|
||||
- 0
|
||||
- 9741
|
||||
- 1.00
|
||||
- 1.00
|
||||
* - financeqa
|
||||
- 0
|
||||
- 1585
|
||||
- 1.00
|
||||
- 1.00
|
||||
* - instruction
|
||||
- 0
|
||||
- 5000
|
||||
- 1.00
|
||||
- 1.00
|
||||
* - jailbreak_behavior_benign
|
||||
- 0
|
||||
- 100
|
||||
- 0.10
|
||||
- 0.20
|
||||
* - jailbreak_behavior_harmful
|
||||
- 1
|
||||
- 100
|
||||
- 0.30
|
||||
- 0.52
|
||||
* - jailbreak_judge
|
||||
- 1
|
||||
- 300
|
||||
- 0.33
|
||||
- 0.49
|
||||
* - jailbreak_prompts
|
||||
- 1
|
||||
- 79
|
||||
- 0.99
|
||||
- 1.00
|
||||
* - jailbreak_tweet
|
||||
- 1
|
||||
- 1282
|
||||
- 0.16
|
||||
- 0.35
|
||||
* - jailbreak_v
|
||||
- 1
|
||||
- 20000
|
||||
- 0.90
|
||||
- 0.93
|
||||
* - jailbreak_vigil
|
||||
- 1
|
||||
- 104
|
||||
- 1.00
|
||||
- 1.00
|
||||
* - mental_health
|
||||
- 0
|
||||
- 3512
|
||||
- 1.00
|
||||
- 1.00
|
||||
* - telecom
|
||||
- 0
|
||||
- 4000
|
||||
- 1.00
|
||||
- 1.00
|
||||
* - truthqa
|
||||
- 0
|
||||
- 817
|
||||
- 1.00
|
||||
- 0.98
|
||||
* - weather
|
||||
- 0
|
||||
- 3121
|
||||
- 1.00
|
||||
- 1.00
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 15 20
|
||||
|
||||
* - Statistics
|
||||
- Overall performance
|
||||
* - Overall Accuracy
|
||||
- 0.93568 (Prompt-Guard), 0.95267 (Arch-Guard)
|
||||
* - True positives rate (TPR)
|
||||
- 0.8468 (Prompt-Guard), 0.8887 (Arch-Guard)
|
||||
* - True negative rate (TNR)
|
||||
- 0.9972 (Prompt-Guard), 0.9970 (Arch-Guard)
|
||||
* - False positive rate (FPR)
|
||||
- 0.0028 (Prompt-Guard), 0.0030 (Arch-Guard)
|
||||
* - False negative rate (FNR)
|
||||
- 0.1532 (Prompt-Guard), 0.1113 (Arch-Guard)
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 15 20
|
||||
|
||||
* - Metrics
|
||||
- Values
|
||||
* - AUC
|
||||
- 0.857 (Prompt-Guard), 0.880 (Arch-Guard)
|
||||
* - Precision
|
||||
- 0.715 (Prompt-Guard), 0.761 (Arch-Guard)
|
||||
* - Recall
|
||||
- 0.999 (Prompt-Guard), 0.999 (Arch-Guard)
|
||||
|
||||
|
||||
|
||||
Arch-FC1B
|
||||
---------
|
||||
Loading…
Add table
Add a link
Reference in a new issue