nomyo/doc/security-guide.md

# Security Guide

## Overview

The NOMYO client provides end-to-end encryption for all communications between your application and the NOMYO inference endpoints. This ensures that your prompts and responses are protected from unauthorized access or interception.

## Encryption Mechanism

### Hybrid Encryption

The client uses a hybrid encryption approach combining:

1. **AES-256-GCM** for payload encryption (authenticated encryption)
2. **RSA-OAEP** for key exchange (4096-bit keys)

This provides both performance (AES for data) and security (RSA for key exchange).

### Key Management

#### Automatic Key Generation

Keys are automatically generated in memory on first use/session init. The client handles all key management internally.

#### Key Persistence (optional)

Keys *can* be saved to the `client_keys/` directory for reuse (i.e. in dev scenarios) across sessions [not recommend]:

```python
# Generate keys and save to file
await client.generate_keys(save_to_file=True, password="your-password")
```

#### Password Protection

Saved private keys should be password-protected in all environments:

```python
await client.generate_keys(save_to_file=True, password="your-strong-password")
```

## Secure Memory Protection

### Ephemeral AES Keys

- **Per-request encryption keys**: A unique AES-256 key is generated for each request
- **Automatic rotation**: AES keys are never reused - a fresh key is created for every encryption operation
- **Forward secrecy**: Compromise of one AES key only affects that single request
- **Secure generation**: AES keys are generated using cryptographically secure random number generation (`secrets.token_bytes`)
- **Automatic cleanup**: AES keys are zeroed from memory immediately after use

### Memory Protection

The client can use secure memory protection to:

- Prevent plaintext payloads from being swapped to disk
- Guarantee memory is zeroed after encryption
- Prevent sensitive data from being stored in memory dumps

## Security Best Practices

### Handle Responses with Minimal Lifetime

The library protects all intermediate crypto material (AES keys, raw plaintext bytes) in secure memory and zeros it immediately after use. However, the **final parsed response dict is returned to you** — and your code is responsible for minimizing how long it lives in memory.

This matters because the *response* is new data you didn't have before: a confidential analysis, PHI summary, or business-critical output. The longer it lives as a reachable Python object, the larger the exposure window from swap files, core dumps, memory inspection, or GC delay.

```python
# GOOD — extract what you need, then delete the response
response = await client.create(
    model="Qwen/Qwen3-0.6B",
    messages=[{"role": "user", "content": "Summarise patient record #1234"}],
    security_tier="maximum"
)
reply = response["choices"][0]["message"]["content"]
del response  # drop the full dict immediately

# ... use reply ...
del reply     # drop when done

# BAD — holding the full response dict longer than needed
response = await client.create(...)
# ... many lines of unrelated code ...
# response still reachable in memory the entire time
text = response["choices"][0]["message"]["content"]
```

> **Note:** Python's `del` removes the reference and allows the GC to reclaim memory sooner, but does not zero the underlying bytes. For maximum protection (PHI, classified data), process the response and discard it as quickly as possible — do not store it in long-lived objects, class attributes, or logs.

### For Production Use

1. **Always use password protection** for private keys
2. **Keep private keys secure** (permissions set to 600 - owner-only access)
3. **Never share your private key**
4. **Verify server's public key fingerprint** before first use
5. **Use HTTPS connections** (never allow HTTP in production)

### Key Management

```python
# Generate keys with password protection
await client.generate_keys(
    save_to_file=True,
    key_dir="client_keys",
    password="strong-password-here"
)

# Load existing keys with password
await client.load_keys(
    "client_keys/private_key.pem",
    "client_keys/public_key.pem",
    password="strong-password-here"
)
```

### Security Tiers

The client supports three security tiers:

- **Standard**: General secure inference
- **High**: Sensitive business data
- **Maximum**: Maximum isolation (HIPAA PHI, classified data)

```python
# Use different security tiers
response = await client.create(
    model="Qwen/Qwen3-0.6B",
    messages=[{"role": "user", "content": "My sensitive data"}],
    security_tier="high"
)
```

## Security Features

### End-to-End Encryption

All prompts and responses are automatically encrypted and decrypted, ensuring:

- No plaintext data is sent over the network
- No plaintext data is stored in memory
- No plaintext data is stored on disk

### Forward Secrecy

Each request uses a unique AES key, ensuring that:

- Compromise of one request's key only affects that request
- Previous requests remain secure even if current key is compromised

### Key Exchange Security

RSA-OAEP key exchange with 4096-bit keys provides:

- Strong encryption for key exchange
- Protection against known attacks
- Forward secrecy for key material

### Memory Protection

Secure memory features:

- Prevents plaintext from being swapped to disk
- Guarantees zeroing of sensitive memory
- Prevents memory dumps from containing sensitive data

## Hardware Attestation (TPM 2.0)

### What it is

When the server has a TPM 2.0 chip, every response includes a `tpm_attestation` block in `_metadata`. This is a cryptographically signed hardware quote proving:

- Which firmware and Secure Boot state the server is running (PCR 0, 7)
- Which application binary is running, when IMA is active (PCR 10)

The quote is signed by an ephemeral AIK (Attestation Identity Key) generated fresh for each request and tied to the `payload_id` nonce, so it cannot be replayed for a different request.

### Reading the attestation

```python
response = await client.create(
    model="Qwen/Qwen3-0.6B",
    messages=[{"role": "user", "content": "..."}],
    security_tier="maximum"
)

tpm = response["_metadata"].get("tpm_attestation", {})

if tpm.get("is_available"):
    print("PCR banks:", tpm["pcr_banks"])         # e.g. "sha256:0,7,10"
    print("PCR values:", tpm["pcr_values"])        # {bank: {index: hex}}
    print("AIK key:", tpm["aik_pubkey_b64"][:32], "...")
else:
    print("TPM not available on this server")
```

### Verifying the quote

The response is self-contained: `aik_pubkey_b64` is the full public key of the AIK that signed the quote, so no separate key-fetch round-trip is needed.

Verification steps using `tpm2-pytss`:

```python
import base64
from tpm2_pytss.types import TPM2B_PUBLIC, TPMT_SIGNATURE, TPM2B_ATTEST

# 1. Decode the quote components
aik_pub = TPM2B_PUBLIC.unmarshal(base64.b64decode(tpm["aik_pubkey_b64"]))[0]
quote   = TPM2B_ATTEST.unmarshal(base64.b64decode(tpm["quote_b64"]))[0]
sig     = TPMT_SIGNATURE.unmarshal(base64.b64decode(tpm["signature_b64"]))[0]

# 2. Verify the signature over the quote using the AIK public key
#    (use a TPM ESAPI verify_signature call or an offline RSA verify)

# 3. Inspect the qualifying_data inside the quote — it must match
#    SHA-256(payload_id.encode())[:16] to confirm this quote is for this request

# 4. Check pcr_values against your known-good baseline
```

> Full verification requires `tpm2-pytss` on the client side (`pip install tpm2-pytss` + `sudo apt install libtss2-dev`). It is optional — the attestation is informational unless your deployment policy requires verification.

### Behaviour per security tier

| Tier | TPM unavailable |
|------|----------------|
| `standard` | `tpm_attestation: {"is_available": false}` — request proceeds |
| `high` | same as standard |
| `maximum` | `ServiceUnavailableError` (HTTP 503) — request rejected |

For `maximum` tier, the server enforces TPM availability as a hard requirement. If your server has no TPM and you request `maximum`, catch the error explicitly:

```python
from nomyo import ServiceUnavailableError

try:
    response = await client.create(..., security_tier="maximum")
except ServiceUnavailableError as e:
    print("Server does not meet TPM requirements for maximum tier:", e)
```

## Compliance Considerations

### HIPAA Compliance

The client can be used for HIPAA-compliant applications when:

- Keys are password-protected
- HTTPS is used for all connections
- Private keys are stored securely
- Appropriate security measures are in place

### Data Classification

- **Standard**: General data
- **High**: Sensitive business data
- **Maximum**: Classified data (PHI, PII, etc.)

## Security Testing

The client includes comprehensive security testing:

- All encryption/decryption operations are tested
- Key management is verified
- Memory protection is validated
- Error handling is tested

## Troubleshooting Security Issues

### Common Issues

1. **Key loading failures**: Ensure private key file permissions are correct (600)
2. **Connection errors**: Verify HTTPS is used for production
3. **Decryption failures**: Check that the correct API key is used
4. **Memory protection errors**: SecureMemory module may not be available on all systems

### Debugging

The client adds metadata to responses that can help with debugging:

```python
response = await client.create(
    model="Qwen/Qwen3-0.6B",
    messages=[{"role": "user", "content": "Hello"}]
)

print(response["_metadata"])  # Contains security_tier, memory_protection, tpm_attestation, etc.
```

See [Hardware Attestation](#hardware-attestation-tpm-20) for details on the `tpm_attestation` field.

### Logging

Enable logging to see security operations:

```python
import logging
logging.basicConfig(level=logging.DEBUG)
```