mirror of
https://github.com/katanemo/plano.git
synced 2026-04-25 00:36:34 +02:00
* added the first set of docs for our technical docs * more docuemtnation changes * added support for prompt processing and updated life of a request * updated docs to including getting help sections and updated life of a request * committing local changes for getting started guide, sample applications, and full reference spec for prompt-config * updated configuration reference, added sample app skeleton, updated favico * fixed the configuration refernce file, and made minor changes to the intent detection. commit v1 for now --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> Co-authored-by: Adil Hafeez <adil@katanemo.com>
139 lines
No EOL
3 KiB
ReStructuredText
139 lines
No EOL
3 KiB
ReStructuredText
.. _llms_in_arch:
|
||
|
||
LLMs
|
||
====
|
||
Arch utilizes purpose-built, industry leading, LLMs to handle the crufty and undifferentiated
|
||
work around accepting, handling and processing prompts. The following
|
||
|
||
Arch-Guard
|
||
----------
|
||
LLM-powered applications are susceptible to prompt attacks, which are prompts intentionally designed to subvert the developer’s
|
||
intended behavior of the LLM.Arch-Guard is a classifier model trained on a large corpus of attacks, capable of detecting explicitly
|
||
malicious prompts (and toxicity).
|
||
|
||
The model is useful as a starting point for identifying and guardrailing against the most risky realistic inputs to
|
||
LLM-powered applications. Our goal in embedding Arch-Guard in the Arch gateway is to enable developers to focus on their business logic
|
||
and factor out security and safety outside application logic. Wth Arch-Guard= developers can take to significantly reduce prompt attack
|
||
risk while maintaining control over the user experience.
|
||
|
||
Below is our test results of the strength of our model as compared to Prompt-Guard from `Meta LLama <https://huggingface.co/meta-llama/Prompt-Guard-86M>`_.
|
||
|
||
.. list-table::
|
||
:header-rows: 1
|
||
:widths: 15 15 10 15 15
|
||
|
||
* - Dataset
|
||
- Jailbreak (Yes/No)
|
||
- Samples
|
||
- Prompt-Guard Accuracy
|
||
- Arch-Guard Accuracy
|
||
* - casual_conversation
|
||
- 0
|
||
- 3725
|
||
- 1.00
|
||
- 1.00
|
||
* - commonqa
|
||
- 0
|
||
- 9741
|
||
- 1.00
|
||
- 1.00
|
||
* - financeqa
|
||
- 0
|
||
- 1585
|
||
- 1.00
|
||
- 1.00
|
||
* - instruction
|
||
- 0
|
||
- 5000
|
||
- 1.00
|
||
- 1.00
|
||
* - jailbreak_behavior_benign
|
||
- 0
|
||
- 100
|
||
- 0.10
|
||
- 0.20
|
||
* - jailbreak_behavior_harmful
|
||
- 1
|
||
- 100
|
||
- 0.30
|
||
- 0.52
|
||
* - jailbreak_judge
|
||
- 1
|
||
- 300
|
||
- 0.33
|
||
- 0.49
|
||
* - jailbreak_prompts
|
||
- 1
|
||
- 79
|
||
- 0.99
|
||
- 1.00
|
||
* - jailbreak_tweet
|
||
- 1
|
||
- 1282
|
||
- 0.16
|
||
- 0.35
|
||
* - jailbreak_v
|
||
- 1
|
||
- 20000
|
||
- 0.90
|
||
- 0.93
|
||
* - jailbreak_vigil
|
||
- 1
|
||
- 104
|
||
- 1.00
|
||
- 1.00
|
||
* - mental_health
|
||
- 0
|
||
- 3512
|
||
- 1.00
|
||
- 1.00
|
||
* - telecom
|
||
- 0
|
||
- 4000
|
||
- 1.00
|
||
- 1.00
|
||
* - truthqa
|
||
- 0
|
||
- 817
|
||
- 1.00
|
||
- 0.98
|
||
* - weather
|
||
- 0
|
||
- 3121
|
||
- 1.00
|
||
- 1.00
|
||
|
||
.. list-table::
|
||
:header-rows: 1
|
||
:widths: 15 20
|
||
|
||
* - Statistics
|
||
- Overall performance
|
||
* - Overall Accuracy
|
||
- 0.93568 (Prompt-Guard), 0.95267 (Arch-Guard)
|
||
* - True positives rate (TPR)
|
||
- 0.8468 (Prompt-Guard), 0.8887 (Arch-Guard)
|
||
* - True negative rate (TNR)
|
||
- 0.9972 (Prompt-Guard), 0.9970 (Arch-Guard)
|
||
* - False positive rate (FPR)
|
||
- 0.0028 (Prompt-Guard), 0.0030 (Arch-Guard)
|
||
* - False negative rate (FNR)
|
||
- 0.1532 (Prompt-Guard), 0.1113 (Arch-Guard)
|
||
|
||
.. list-table::
|
||
:header-rows: 1
|
||
:widths: 15 20
|
||
|
||
* - Metrics
|
||
- Values
|
||
* - AUC
|
||
- 0.857 (Prompt-Guard), 0.880 (Arch-Guard)
|
||
* - Precision
|
||
- 0.715 (Prompt-Guard), 0.761 (Arch-Guard)
|
||
* - Recall
|
||
- 0.999 (Prompt-Guard), 0.999 (Arch-Guard)
|
||
|
||
|
||
|
||
Arch-FC1B
|
||
--------- |