Security Guide
Overview
The NOMYO client provides end-to-end encryption for all communications between your application and the NOMYO inference endpoints. This ensures that your prompts and responses are protected from unauthorized access or interception.
Encryption Mechanism
Hybrid Encryption
The client uses a hybrid encryption approach combining:
- AES-256-GCM for payload encryption (authenticated encryption)
- RSA-OAEP for key exchange (4096-bit keys)
This provides both performance (AES for data) and security (RSA for key exchange).
Key Management
Automatic Key Generation
Keys are automatically generated in memory on first use/session init. The client handles all key management internally.
Key Persistence (optional)
Keys can be saved to the client_keys/ directory for reuse (i.e. in dev scenarios) across sessions [not recommend]:
# Generate keys and save to file
await client.generate_keys(save_to_file=True, password="your-password")
Password Protection
Saved private keys should be password-protected in all environments:
await client.generate_keys(save_to_file=True, password="your-strong-password")
Secure Memory Protection
Ephemeral AES Keys
- Per-request encryption keys: A unique AES-256 key is generated for each request
- Automatic rotation: AES keys are never reused - a fresh key is created for every encryption operation
- Forward secrecy: Compromise of one AES key only affects that single request
- Secure generation: AES keys are generated using cryptographically secure random number generation (
secrets.token_bytes) - Automatic cleanup: AES keys are zeroed from memory immediately after use
Memory Protection
The client can use secure memory protection to:
- Prevent plaintext payloads from being swapped to disk
- Guarantee memory is zeroed after encryption
- Prevent sensitive data from being stored in memory dumps
Security Best Practices
Handle Responses with Minimal Lifetime
The library protects all intermediate crypto material (AES keys, raw plaintext bytes) in secure memory and zeros it immediately after use. However, the final parsed response dict is returned to you — and your code is responsible for minimizing how long it lives in memory.
This matters because the response is new data you didn't have before: a confidential analysis, PHI summary, or business-critical output. The longer it lives as a reachable Python object, the larger the exposure window from swap files, core dumps, memory inspection, or GC delay.
# GOOD — extract what you need, then delete the response
response = await client.create(
model="Qwen/Qwen3-0.6B",
messages=[{"role": "user", "content": "Summarise patient record #1234"}],
security_tier="maximum"
)
reply = response["choices"][0]["message"]["content"]
del response # drop the full dict immediately
# ... use reply ...
del reply # drop when done
# BAD — holding the full response dict longer than needed
response = await client.create(...)
# ... many lines of unrelated code ...
# response still reachable in memory the entire time
text = response["choices"][0]["message"]["content"]
Note: Python's
delremoves the reference and allows the GC to reclaim memory sooner, but does not zero the underlying bytes. For maximum protection (PHI, classified data), process the response and discard it as quickly as possible — do not store it in long-lived objects, class attributes, or logs.
For Production Use
- Always use password protection for private keys
- Keep private keys secure (permissions set to 600 - owner-only access)
- Never share your private key
- Verify server's public key fingerprint before first use
- Use HTTPS connections (never allow HTTP in production)
Key Management
# Generate keys with password protection
await client.generate_keys(
save_to_file=True,
key_dir="client_keys",
password="strong-password-here"
)
# Load existing keys with password
await client.load_keys(
"client_keys/private_key.pem",
"client_keys/public_key.pem",
password="strong-password-here"
)
Security Tiers
The client supports three security tiers:
- Standard: General secure inference
- High: Sensitive business data
- Maximum: Maximum isolation (HIPAA PHI, classified data)
# Use different security tiers
response = await client.create(
model="Qwen/Qwen3-0.6B",
messages=[{"role": "user", "content": "My sensitive data"}],
security_tier="high"
)
Security Features
End-to-End Encryption
All prompts and responses are automatically encrypted and decrypted, ensuring:
- No plaintext data is sent over the network
- No plaintext data is stored in memory
- No plaintext data is stored on disk
Forward Secrecy
Each request uses a unique AES key, ensuring that:
- Compromise of one request's key only affects that request
- Previous requests remain secure even if current key is compromised
Key Exchange Security
RSA-OAEP key exchange with 4096-bit keys provides:
- Strong encryption for key exchange
- Protection against known attacks
- Forward secrecy for key material
Memory Protection
Secure memory features:
- Prevents plaintext from being swapped to disk
- Guarantees zeroing of sensitive memory
- Prevents memory dumps from containing sensitive data
Hardware Attestation (TPM 2.0)
What it is
When the server has a TPM 2.0 chip, every response includes a tpm_attestation block in _metadata. This is a cryptographically signed hardware quote proving:
- Which firmware and Secure Boot state the server is running (PCR 0, 7)
- Which application binary is running, when IMA is active (PCR 10)
The quote is signed by an ephemeral AIK (Attestation Identity Key) generated fresh for each request and tied to the payload_id nonce, so it cannot be replayed for a different request.
Reading the attestation
response = await client.create(
model="Qwen/Qwen3-0.6B",
messages=[{"role": "user", "content": "..."}],
security_tier="maximum"
)
tpm = response["_metadata"].get("tpm_attestation", {})
if tpm.get("is_available"):
print("PCR banks:", tpm["pcr_banks"]) # e.g. "sha256:0,7,10"
print("PCR values:", tpm["pcr_values"]) # {bank: {index: hex}}
print("AIK key:", tpm["aik_pubkey_b64"][:32], "...")
else:
print("TPM not available on this server")
Verifying the quote
The response is self-contained: aik_pubkey_b64 is the full public key of the AIK that signed the quote, so no separate key-fetch round-trip is needed.
Verification steps using tpm2-pytss:
import base64
from tpm2_pytss.types import TPM2B_PUBLIC, TPMT_SIGNATURE, TPM2B_ATTEST
# 1. Decode the quote components
aik_pub = TPM2B_PUBLIC.unmarshal(base64.b64decode(tpm["aik_pubkey_b64"]))[0]
quote = TPM2B_ATTEST.unmarshal(base64.b64decode(tpm["quote_b64"]))[0]
sig = TPMT_SIGNATURE.unmarshal(base64.b64decode(tpm["signature_b64"]))[0]
# 2. Verify the signature over the quote using the AIK public key
# (use a TPM ESAPI verify_signature call or an offline RSA verify)
# 3. Inspect the qualifying_data inside the quote — it must match
# SHA-256(payload_id.encode())[:16] to confirm this quote is for this request
# 4. Check pcr_values against your known-good baseline
Full verification requires
tpm2-pytsson the client side (pip install tpm2-pytss+sudo apt install libtss2-dev). It is optional — the attestation is informational unless your deployment policy requires verification.
Behaviour per security tier
| Tier | TPM unavailable |
|---|---|
standard |
tpm_attestation: {"is_available": false} — request proceeds |
high |
same as standard |
maximum |
ServiceUnavailableError (HTTP 503) — request rejected |
For maximum tier, the server enforces TPM availability as a hard requirement. If your server has no TPM and you request maximum, catch the error explicitly:
from nomyo import ServiceUnavailableError
try:
response = await client.create(..., security_tier="maximum")
except ServiceUnavailableError as e:
print("Server does not meet TPM requirements for maximum tier:", e)
Compliance Considerations
HIPAA Compliance
The client can be used for HIPAA-compliant applications when:
- Keys are password-protected
- HTTPS is used for all connections
- Private keys are stored securely
- Appropriate security measures are in place
Data Classification
- Standard: General data
- High: Sensitive business data
- Maximum: Classified data (PHI, PII, etc.)
Security Testing
The client includes comprehensive security testing:
- All encryption/decryption operations are tested
- Key management is verified
- Memory protection is validated
- Error handling is tested
Troubleshooting Security Issues
Common Issues
- Key loading failures: Ensure private key file permissions are correct (600)
- Connection errors: Verify HTTPS is used for production
- Decryption failures: Check that the correct API key is used
- Memory protection errors: SecureMemory module may not be available on all systems
Debugging
The client adds metadata to responses that can help with debugging:
response = await client.create(
model="Qwen/Qwen3-0.6B",
messages=[{"role": "user", "content": "Hello"}]
)
print(response["_metadata"]) # Contains security_tier, memory_protection, tpm_attestation, etc.
See Hardware Attestation for details on the tpm_attestation field.
Logging
Enable logging to see security operations:
import logging
logging.basicConfig(level=logging.DEBUG)