**DP-Fusion-Lib** enables Large Language Model inference with mathematically provable differential privacy guarantees. Based on our research paper [*"DP-Fusion: Token-Level Differentially Private Inference for Large Language Models"*](https://arxiv.org/abs/2507.04531), this library provides formal (ε, δ)-DP protection for sensitive text generation workflows.
Differential privacy is the core foundation, but the library addresses the **full spectrum of text and document privacy**. Its **PII detection and rewriting tools** can be used **with or without DP**, offering practical privacy protection by default, and **formal guarantees** when DP is enabled.
**[Try the Live Demo](https://www.documentprivacy.com)**
**[Run the example collab notebook](https://colab.research.google.com/drive/1hzoUAXF_jsFU9E3D6U5ceZdYZ3wfXPPd?usp=sharing)**
---
## Overview

Traditional privacy approaches for LLMs rely on heuristic redaction or post-hoc filtering. **DP-Fusion-Lib** goes further by providing a complete privacy framework with three levels of protection:
| Level | Approach | Protection |
|-------|----------|------------|
| 1 | **Redaction** | Automatic PII detection and replacement via Constitutional Tagger API |
| 2 | **Paraphrasing** | Context rewriting to obscure stylistic and contextual signatures |
| 3 | **Differential Privacy** | Formal (ε, δ)-DP guarantees via controlled distribution fusion |
The library achieves Level 3 protection by fusing token probability distributions from private and redacted contexts, bounding the Rényi divergence at each generation step to provide provable privacy guarantees.
---
## Technical Approach

DP-Fusion operates by maintaining two parallel contexts during generation:
- **Private Context**: The original document containing sensitive information
- **Public Context**: A redacted version with sensitive phrases replaced by placeholders
At each token generation step, the algorithm:
1. Computes next-token probability distributions for both contexts
2. Performs binary search to find the optimal mixing parameter λ
3. Ensures the fused distribution satisfies the Rényi divergence bound
4. Samples from the privacy-preserving mixed distribution
This approach guarantees that the output distribution is statistically similar regardless of the specific private information present, providing formal differential privacy.
---
## Installation
```bash
pip install dp-fusion-lib
```
**Hardware Requirements**: This library requires PyTorch. For production deployments, NVIDIA GPU acceleration is recommended. The `Qwen/Qwen2.5-7B-Instruct` model provides an effective balance between generation quality and privacy utility.
For a complete working example, see the [basic usage script](examples/basic_usage.py) or run the interactive [Jupyter notebook](examples/basic_usage.ipynb).
### Step 1: Initialize Components
The Tagger API provides automated sensitive phrase detection using Constitutional AI. API keys are available at [console.documentprivacy.com](https://console.documentprivacy.com).
```python
from dp_fusion_lib import DPFusion, Tagger, compute_epsilon_single_group
from transformers import AutoModelForCausalLM, AutoTokenizer
**Recommendation**: For most applications, start with `alpha=2.0` and `beta=0.01`. Adjust based on your privacy-utility requirements.
---
## Data Privacy
While `dp-fusion-lib` executes entirely on your infrastructure, the Tagger API requires an external call for sensitive phrase detection. For anyone with strict data residency or compliance requirements please contact me, I will help-out.