doc: created

2026-04-16 16:44:26 +02:00

3.7 KiB

Raw Blame History

Available Models

All models are available via api.nomyo.ai. Pass the model ID string directly to the model field of create().

Model List

Model ID	Parameters	Type	Notes
`Qwen/Qwen3-0.6B`	0.6B	General	Lightweight, fast inference
`Qwen/Qwen3.5-0.8B`	0.8B	General	Lightweight, fast inference
`LiquidAI/LFM2.5-1.2B-Thinking`	1.2B	Thinking	Reasoning model
`ibm-granite/granite-4.0-h-small`	Small	General	IBM Granite 4.0, enterprise-focused
`Qwen/Qwen3.5-9B`	9B	General	Balanced quality and speed
`utter-project/EuroLLM-9B-Instruct-2512`	9B	General	Multilingual, strong European language support
`zai-org/GLM-4.7-Flash`	—	General	Fast GLM variant
`mistralai/Ministral-3-14B-Instruct-2512-GGUF`	14B	General	Mistral instruction-tuned
`ServiceNow-AI/Apriel-1.6-15b-Thinker`	15B	Thinking	Reasoning model
`openai/gpt-oss-20b`	20B	General	OpenAI open-weight release
`LiquidAI/LFM2-24B-A2B`	24B (2B active)	General	MoE — efficient inference
`Qwen/Qwen3.5-27B`	27B	General	High quality, large context
`google/medgemma-27b-it`	27B	Specialized	Medical domain, instruction-tuned
`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4`	30B (3B active)	General	MoE — efficient inference
`Qwen/Qwen3.5-35B-A3B`	35B (3B active)	General	MoE — efficient inference
`moonshotai/Kimi-Linear-48B-A3B-Instruct`	48B (3B active)	General	MoE — large capacity, efficient inference

MoE (Mixture of Experts) models show total/active parameter counts. Only active parameters are used per token, keeping inference cost low relative to total model size.

Usage

import { SecureChatCompletion } from 'nomyo-js';

const client = new SecureChatCompletion({ apiKey: process.env.NOMYO_API_KEY });

const response = await client.create({
  model: 'Qwen/Qwen3.5-9B',
  messages: [{ role: 'user', content: 'Hello!' }],
});

Choosing a Model

Goal	Recommended models
Low latency / edge	`Qwen/Qwen3-0.6B`, `Qwen/Qwen3.5-0.8B`, `LiquidAI/LFM2.5-1.2B-Thinking`
Balanced quality + speed	`Qwen/Qwen3.5-9B`, `mistralai/Ministral-3-14B-Instruct-2512-GGUF`
Reasoning / chain-of-thought	`LiquidAI/LFM2.5-1.2B-Thinking`, `ServiceNow-AI/Apriel-1.6-15b-Thinker`
Multilingual	`utter-project/EuroLLM-9B-Instruct-2512`
Medical	`google/medgemma-27b-it`
Highest quality	`moonshotai/Kimi-Linear-48B-A3B-Instruct`, `Qwen/Qwen3.5-35B-A3B`

Thinking Models

Models marked Thinking return an additional reasoning_content field in the response message alongside the normal content. This contains the model's internal chain-of-thought:

const response = await client.create({
  model: 'LiquidAI/LFM2.5-1.2B-Thinking',
  messages: [{ role: 'user', content: 'Is 9.9 or 9.11 larger?' }],
});

const { content, reasoning_content } = response.choices[0].message;
console.log('Reasoning:', reasoning_content); // internal chain-of-thought
console.log('Answer:', content);              // final answer

Security Tier Compatibility

Not all models are available on all security tiers. If a model is not permitted for the requested tier, the server returns HTTP 403 and the client throws ForbiddenError.

import { ForbiddenError } from 'nomyo-js';

try {
  const response = await client.create({
    model: 'Qwen/Qwen3.5-27B',
    messages: [{ role: 'user', content: '...' }],
    security_tier: 'maximum',
  });
} catch (err) {
  if (err instanceof ForbiddenError) {
    // Model not available at this security tier — retry with a different tier or model
  }
}

3.7 KiB Raw Blame History