The NOMYO client provides end-to-end encryption for all communications between your application and the NOMYO inference endpoints. This ensures that your prompts and responses are protected from unauthorized access or interception.
## Encryption Mechanism
### Hybrid Encryption
The client uses a hybrid encryption approach combining:
1.**AES-256-GCM** for payload encryption (authenticated encryption)
2.**RSA-OAEP** for key exchange (4096-bit keys)
This provides both performance (AES for data) and security (RSA for key exchange).
### Key Management
#### Automatic Key Generation
Keys are automatically generated in memory on first use/session init. The client handles all key management internally.
#### Key Persistence (optional)
Keys *can* be saved to the `client_keys/` directory for reuse (i.e. in dev scenarios) across sessions [not recommend]:
The library protects all intermediate crypto material (AES keys, raw plaintext bytes) in secure memory and zeros it immediately after use. However, the **final parsed response dict is returned to you** — and your code is responsible for minimizing how long it lives in memory.
This matters because the *response* is new data you didn't have before: a confidential analysis, PHI summary, or business-critical output. The longer it lives as a reachable Python object, the larger the exposure window from swap files, core dumps, memory inspection, or GC delay.
```python
# GOOD — extract what you need, then delete the response
response = await client.create(
model="Qwen/Qwen3-0.6B",
messages=[{"role": "user", "content": "Summarise patient record #1234"}],
# BAD — holding the full response dict longer than needed
response = await client.create(...)
# ... many lines of unrelated code ...
# response still reachable in memory the entire time
text = response["choices"][0]["message"]["content"]
```
> **Note:** Python's `del` removes the reference and allows the GC to reclaim memory sooner, but does not zero the underlying bytes. For maximum protection (PHI, classified data), process the response and discard it as quickly as possible — do not store it in long-lived objects, class attributes, or logs.
When the server has a TPM 2.0 chip, every response includes a `tpm_attestation` block in `_metadata`. This is a cryptographically signed hardware quote proving:
- Which firmware and Secure Boot state the server is running (PCR 0, 7)
- Which application binary is running, when IMA is active (PCR 10)
The quote is signed by an ephemeral AIK (Attestation Identity Key) generated fresh for each request and tied to the `payload_id` nonce, so it cannot be replayed for a different request.
The response is self-contained: `aik_pubkey_b64` is the full public key of the AIK that signed the quote, so no separate key-fetch round-trip is needed.
Verification steps using `tpm2-pytss`:
```python
import base64
from tpm2_pytss.types import TPM2B_PUBLIC, TPMT_SIGNATURE, TPM2B_ATTEST
sig = TPMT_SIGNATURE.unmarshal(base64.b64decode(tpm["signature_b64"]))[0]
# 2. Verify the signature over the quote using the AIK public key
# (use a TPM ESAPI verify_signature call or an offline RSA verify)
# 3. Inspect the qualifying_data inside the quote — it must match
# SHA-256(payload_id.encode())[:16] to confirm this quote is for this request
# 4. Check pcr_values against your known-good baseline
```
> Full verification requires `tpm2-pytss` on the client side (`pip install tpm2-pytss` + `sudo apt install libtss2-dev`). It is optional — the attestation is informational unless your deployment policy requires verification.
For `maximum` tier, the server enforces TPM availability as a hard requirement. If your server has no TPM and you request `maximum`, catch the error explicitly: