2 Security Guide
Alpha Nerd edited this page 2026-04-18 15:42:22 +02:00

Security Guide

Overview

The NOMYO client provides end-to-end encryption for all communications between your application and the NOMYO inference endpoints. This ensures that your prompts and responses are protected from unauthorized access or interception.

Encryption Mechanism

Hybrid Encryption

The client uses a hybrid encryption approach combining:

  1. AES-256-GCM for payload encryption (authenticated encryption)
  2. RSA-OAEP for key exchange (4096-bit keys)

This provides both performance (AES for data) and security (RSA for key exchange).

Key Management

Automatic Key Generation

Keys are automatically generated in memory on first use/session init. The client handles all key management internally.

Key Persistence (optional)

Keys can be saved to the client_keys/ directory for reuse (i.e. in dev scenarios) across sessions [not recommend]:

# Generate keys and save to file
await client.generate_keys(save_to_file=True, password="your-password")

Password Protection

Saved private keys should be password-protected in all environments:

await client.generate_keys(save_to_file=True, password="your-strong-password")

Secure Memory Protection

Ephemeral AES Keys

  • Per-request encryption keys: A unique AES-256 key is generated for each request
  • Automatic rotation: AES keys are never reused - a fresh key is created for every encryption operation
  • Forward secrecy: Compromise of one AES key only affects that single request
  • Secure generation: AES keys are generated using cryptographically secure random number generation (secrets.token_bytes)
  • Automatic cleanup: AES keys are zeroed from memory immediately after use

Memory Protection

The client can use secure memory protection to:

  • Prevent plaintext payloads from being swapped to disk
  • Guarantee memory is zeroed after encryption
  • Prevent sensitive data from being stored in memory dumps

Security Best Practices

Handle Responses with Minimal Lifetime

The library protects all intermediate crypto material (AES keys, raw plaintext bytes) in secure memory and zeros it immediately after use. However, the final parsed response dict is returned to you — and your code is responsible for minimizing how long it lives in memory.

This matters because the response is new data you didn't have before: a confidential analysis, PHI summary, or business-critical output. The longer it lives as a reachable Python object, the larger the exposure window from swap files, core dumps, memory inspection, or GC delay.

# GOOD — extract what you need, then delete the response
response = await client.create(
    model="Qwen/Qwen3-0.6B",
    messages=[{"role": "user", "content": "Summarise patient record #1234"}],
    security_tier="maximum"
)
reply = response["choices"][0]["message"]["content"]
del response  # drop the full dict immediately

# ... use reply ...
del reply     # drop when done

# BAD — holding the full response dict longer than needed
response = await client.create(...)
# ... many lines of unrelated code ...
# response still reachable in memory the entire time
text = response["choices"][0]["message"]["content"]

Note: Python's del removes the reference and allows the GC to reclaim memory sooner, but does not zero the underlying bytes. For maximum protection (PHI, classified data), process the response and discard it as quickly as possible — do not store it in long-lived objects, class attributes, or logs.

For Production Use

  1. Always use password protection for private keys
  2. Keep private keys secure (permissions set to 600 - owner-only access)
  3. Never share your private key
  4. Verify server's public key fingerprint before first use
  5. Use HTTPS connections (never allow HTTP in production)

Key Management

# Generate keys with password protection
await client.generate_keys(
    save_to_file=True,
    key_dir="client_keys",
    password="strong-password-here"
)

# Load existing keys with password
await client.load_keys(
    "client_keys/private_key.pem",
    "client_keys/public_key.pem",
    password="strong-password-here"
)

Security Tiers

The client supports three security tiers:

  • Standard: General secure inference
  • High: Sensitive business data
  • Maximum: Maximum isolation (HIPAA PHI, classified data)
# Use different security tiers
response = await client.create(
    model="Qwen/Qwen3-0.6B",
    messages=[{"role": "user", "content": "My sensitive data"}],
    security_tier="high"
)

Security Features

End-to-End Encryption

All prompts and responses are automatically encrypted and decrypted, ensuring:

  • No plaintext data is sent over the network
  • No plaintext data is stored in memory
  • No plaintext data is stored on disk

Forward Secrecy

Each request uses a unique AES key, ensuring that:

  • Compromise of one request's key only affects that request
  • Previous requests remain secure even if current key is compromised

Key Exchange Security

RSA-OAEP key exchange with 4096-bit keys provides:

  • Strong encryption for key exchange
  • Protection against known attacks
  • Forward secrecy for key material

Memory Protection

Secure memory features:

  • Prevents plaintext from being swapped to disk
  • Guarantees zeroing of sensitive memory
  • Prevents memory dumps from containing sensitive data

Hardware Attestation (TPM 2.0)

What it is

When the server has a TPM 2.0 chip, every response includes a tpm_attestation block in _metadata. This is a cryptographically signed hardware quote proving:

  • Which firmware and Secure Boot state the server is running (PCR 0, 7)
  • Which application binary is running, when IMA is active (PCR 10)

The quote is signed by an ephemeral AIK (Attestation Identity Key) generated fresh for each request and tied to the payload_id nonce, so it cannot be replayed for a different request.

Reading the attestation

response = await client.create(
    model="Qwen/Qwen3-0.6B",
    messages=[{"role": "user", "content": "..."}],
    security_tier="maximum"
)

tpm = response["_metadata"].get("tpm_attestation", {})

if tpm.get("is_available"):
    print("PCR banks:", tpm["pcr_banks"])         # e.g. "sha256:0,7,10"
    print("PCR values:", tpm["pcr_values"])        # {bank: {index: hex}}
    print("AIK key:", tpm["aik_pubkey_b64"][:32], "...")
else:
    print("TPM not available on this server")

Verifying the quote

The response is self-contained: aik_pubkey_b64 is the full public key of the AIK that signed the quote, so no separate key-fetch round-trip is needed.

Verification steps using tpm2-pytss:

import base64
from tpm2_pytss.types import TPM2B_PUBLIC, TPMT_SIGNATURE, TPM2B_ATTEST

# 1. Decode the quote components
aik_pub = TPM2B_PUBLIC.unmarshal(base64.b64decode(tpm["aik_pubkey_b64"]))[0]
quote   = TPM2B_ATTEST.unmarshal(base64.b64decode(tpm["quote_b64"]))[0]
sig     = TPMT_SIGNATURE.unmarshal(base64.b64decode(tpm["signature_b64"]))[0]

# 2. Verify the signature over the quote using the AIK public key
#    (use a TPM ESAPI verify_signature call or an offline RSA verify)

# 3. Inspect the qualifying_data inside the quote — it must match
#    SHA-256(payload_id.encode())[:16] to confirm this quote is for this request

# 4. Check pcr_values against your known-good baseline

Full verification requires tpm2-pytss on the client side (pip install tpm2-pytss + sudo apt install libtss2-dev). It is optional — the attestation is informational unless your deployment policy requires verification.

Behaviour per security tier

Tier TPM unavailable
standard tpm_attestation: {"is_available": false} — request proceeds
high same as standard
maximum ServiceUnavailableError (HTTP 503) — request rejected

For maximum tier, the server enforces TPM availability as a hard requirement. If your server has no TPM and you request maximum, catch the error explicitly:

from nomyo import ServiceUnavailableError

try:
    response = await client.create(..., security_tier="maximum")
except ServiceUnavailableError as e:
    print("Server does not meet TPM requirements for maximum tier:", e)

Compliance Considerations

HIPAA Compliance

The client can be used for HIPAA-compliant applications when:

  • Keys are password-protected
  • HTTPS is used for all connections
  • Private keys are stored securely
  • Appropriate security measures are in place

Data Classification

  • Standard: General data
  • High: Sensitive business data
  • Maximum: Classified data (PHI, PII, etc.)

Security Testing

The client includes comprehensive security testing:

  • All encryption/decryption operations are tested
  • Key management is verified
  • Memory protection is validated
  • Error handling is tested

Troubleshooting Security Issues

Common Issues

  1. Key loading failures: Ensure private key file permissions are correct (600)
  2. Connection errors: Verify HTTPS is used for production
  3. Decryption failures: Check that the correct API key is used
  4. Memory protection errors: SecureMemory module may not be available on all systems

Debugging

The client adds metadata to responses that can help with debugging:

response = await client.create(
    model="Qwen/Qwen3-0.6B",
    messages=[{"role": "user", "content": "Hello"}]
)

print(response["_metadata"])  # Contains security_tier, memory_protection, tpm_attestation, etc.

See Hardware Attestation for details on the tpm_attestation field.

Logging

Enable logging to see security operations:

import logging
logging.basicConfig(level=logging.DEBUG)