Skip to content

Model Registry Pricing Audit — April 2026

Date: 2026-04-09 Source: agent_app/routing/model_registry.py vs OpenRouter /api/v1/models API Methodology: Registered USD/M-token costs compared against live OpenRouter pricing. Drift = (registered − actual) / actual × 100. Positive = overcharging users. Negative = platform absorbs cost.


Section 1: Summary Table

Sorted by maximum absolute drift descending. ✅ within ±5%, ⚠️ 5–25%, ❌ >25%.

Model Slug Reg In $/M Actual In $/M In Drift % Reg Out $/M Actual Out $/M Out Drift % Status
anthropic/claude-opus-4.5 15.00 5.00 +200.0% 75.00 25.00 +200.0%
openai/gpt-5-image-mini 1.25 2.50 −50.0% 5.00 2.00 +150.0%
openai/gpt-5 3.00 1.25 +140.0% 12.00 10.00 +20.0%
x-ai/grok-4 0.20 3.00 −93.3% 1.50 15.00 −90.0%
deepcogito/cogito-v2.1-671b 0.14 1.25 −88.8% 0.80 1.25 −36.0%
google/gemini-2.5-flash-image 0.10 0.30 −66.7% 0.40 2.50 −84.0%
x-ai/grok-code-fast-1 0.05 0.20 −75.0% 0.25 1.50 −83.3%
google/gemini-2.5-flash 0.15 0.30 −50.0% 0.60 2.50 −76.0%
openai/gpt-5-image 2.50 10.00 −75.0% 10.00 10.00 0.0%
x-ai/grok-4-fast 0.05 0.20 −75.0% 0.25 0.50 −50.0%
qwen/qwen2.5-coder-7b-instruct 0.05 0.03 +66.7% 0.05 0.09 −44.4%
google/gemini-3-pro-image-preview 1.25 2.00 −37.5% 5.00 12.00 −58.3%
moonshotai/kimi-k2-0905 0.60 0.40 +50.0% 2.00 2.00 0.0%
openai/gpt-5.1 1.25 1.25 0.0% 5.00 10.00 −50.0%
deepseek/deepseek-v3.1-terminus 0.14 0.21 −33.3% 0.80 0.79 +1.3%
deepseek/deepseek-chat-v3-0324 0.14 0.20 −30.0% 0.80 0.77 +3.9%
deepseek/deepseek-r1-0528 0.55 0.45 +22.2% 2.19 2.15 +1.9% ⚠️
moonshotai/kimi-k2-thinking 0.60 0.60 0.0% 2.00 2.50 −20.0% ⚠️
deepseek/deepseek-chat-v3.1 0.14 0.15 −6.7% 0.80 0.75 +6.7% ⚠️
deepseek/deepseek-v3.2 0.25 0.26 −3.8% 0.38 0.38 0.0%
anthropic/claude-opus-4.6 5.00 5.00 0.0% 25.00 25.00 0.0%
anthropic/claude-sonnet-4.6 3.00 3.00 0.0% 15.00 15.00 0.0%
anthropic/claude-sonnet-4.5 3.00 3.00 0.0% 15.00 15.00 0.0%
anthropic/claude-haiku-4.5 1.00 1.00 0.0% 5.00 5.00 0.0%
anthropic/claude-sonnet-4 3.00 3.00 0.0% 15.00 15.00 0.0%
anthropic/claude-opus-4 15.00 15.00 0.0% 75.00 75.00 0.0%
anthropic/claude-3-haiku 0.25 0.25 0.0% 1.25 1.25 0.0%
google/gemini-3.1-pro-preview 2.00 2.00 0.0% 12.00 12.00 0.0%
google/gemini-3-flash-preview 0.50 0.50 0.0% 3.00 3.00 0.0%
google/gemini-2.5-flash-lite 0.10 0.10 0.0% 0.40 0.40 0.0%
google/gemini-2.0-flash-001 0.10 0.10 0.0% 0.40 0.40 0.0%
google/gemini-3.1-flash-image-preview 0.50 0.50 0.0% 3.00 3.00 0.0%
x-ai/grok-4.1-fast 0.20 0.20 0.0% 0.50 0.50 0.0%
meta-llama/llama-4-maverick 0.15 0.15 0.0% 0.60 0.60 0.0%
meta-llama/llama-guard-4-12b 0.18 0.18 0.0% 0.18 0.18 0.0%
openai/gpt-5.4 2.50 2.50 0.0% 15.00 15.00 0.0%
openai/o3 2.00 2.00 0.0% 8.00 8.00 0.0%
openai/gpt-4o 2.50 2.50 0.0% 10.00 10.00 0.0%
openai/gpt-3.5-turbo 0.50 0.50 0.0% 1.50 1.50 0.0%
xiaomi/mimo-v2-flash 0.09 0.09 0.0% 0.29 0.29 0.0%

Section 2: Per-Model Notes

URGENT — Models on Active Code Paths with Severe Drift

openai/gpt-5-image (−75% input drift) — ABSORBING COST

Registry lists $2.50/M input but OpenRouter charges $10.00/M. Used in agent_app/virtual_run/image_providers/openrouter.py as the primary OpenRouter image generation model. Every image generation call costs the platform 4× what is recorded. Context window also stale: registered 128K vs actual 400K.

Recommended action: Update input cost to $10.00/M immediately. This is a live execution path.

openai/gpt-5-image-mini (−50% input, +150% output) — MIXED

Registry: \(1.25/\)5.00. Actual: \(2.50/\)2.00. Used as the secondary image model in image_providers/openrouter.py. Input is under-recorded (platform absorbs), output is over-recorded (users overcharged on output). Net effect depends on input/output ratio for image calls.

Recommended action: Update to \(2.50/\)2.00. This is a live execution path.

x-ai/grok-4 (−93% input, −90% output) — ABSORBING COST

Registry: \(0.20/\)1.50. Actual: \(3.00/\)15.00. Listed in settings UI (model_presets.py:732) and model_config.py:169 as an available model. Not in default presets or pipeline, but if an admin selects it, the platform absorbs ~93% of the true cost. The registered price appears to be from the original Grok 3/early Grok 4 era before a major price increase.

Recommended action: Update to \(3.00/\)15.00 or remove from settings UI if no longer cost-competitive.

x-ai/grok-4-fast (−75% input, −50% output) — ABSORBING COST

Registry: \(0.05/\)0.25. Actual: \(0.20/\)0.50. Listed in settings UI (model_presets.py:708) and model_config.py:155. Same pattern as grok-4: early pricing never updated.

Recommended action: Update to \(0.20/\)0.50.

openai/gpt-5.1 (−50% output drift) — ABSORBING COST

Registry: $5.00/M output. Actual: \(10.00/M output. Input is accurate (\)1.25). Listed in settings UI (model_presets.py:454). Context window also stale: registered 272K vs actual 400K. Platform absorbs half the output cost if a user selects this model.

Recommended action: Update output cost to $10.00/M and context window to 400K.

openai/gpt-5 (+140% input drift) — OVERCHARGING USERS

Registry: $3.00/M input. Actual: $1.25/M input. Output moderate: registered $12.00 vs actual $10.00 (+20%). Not in any default preset or active pipeline stage. Referenced in verify_gpt52_model.py (test file) and legacy code only. Context window stale: 128K vs 400K.

Recommended action: Update to \(1.25/\)10.00 or mark deprecated if not needed.

Non-Pipeline Models with Severe Drift

anthropic/claude-opus-4.5 (+200% both directions) — OVERCHARGING

Registry: \(15.00/\)75.00. Actual: \(5.00/\)25.00. Marked deprecated=True. The registry entry carries the old Claude Opus 4 pricing (\(15/\)75) but Anthropic priced Opus 4.5 at \(5/\)25. Any code path still referencing this model records 3× the true cost. Not in pipeline or presets.

Recommended action: Update to \(5.00/\)25.00 for historical accuracy of cost records, even though deprecated.

deepcogito/cogito-v2.1-671b (−89% input, −36% output) — ABSORBING COST

Registry: \(0.14/\)0.80. Actual: \(1.25/\)1.25. Not in pipeline or presets. The registered price appears copied from DeepSeek V3 pricing, not the actual Cogito model price. If selected via admin override, the platform absorbs ~89% of input cost.

Recommended action: Update to \(1.25/\)1.25.

google/gemini-2.5-flash (−50% input, −76% output) — ABSORBING COST

Registry: \(0.15/\)0.60. Actual: \(0.30/\)2.50. Not in active pipeline (superseded by Gemini 3 variants). Referenced in capability_requirements.py and legacy chat app. The output price is particularly stale — registered at $0.60 vs actual $2.50 (4.2× under).

Recommended action: Update to \(0.30/\)2.50 or mark deprecated.

google/gemini-2.5-flash-image (−67% input, −84% output) — ABSORBING COST

Registry: \(0.10/\)0.40. Actual: \(0.30/\)2.50. Context window severely stale: registered 1M vs actual 32K. The pricing was likely copied from the non-image Gemini 2.0 Flash entry.

Recommended action: Update pricing to \(0.30/\)2.50, context window to 32,768.

google/gemini-3-pro-image-preview (−38% input, −58% output) — ABSORBING COST

Registry: \(1.25/\)5.00. Actual: \(2.00/\)12.00. Context window stale: registered 1M vs actual 65K. Used as an image model option.

Recommended action: Update to \(2.00/\)12.00, context window to 65,536.

x-ai/grok-code-fast-1 (−75% input, −83% output) — ABSORBING COST

Registry: \(0.05/\)0.25. Actual: \(0.20/\)1.50. Not in pipeline or presets. Context window stale: 128K vs 256K.

Recommended action: Update to \(0.20/\)1.50.

moonshotai/kimi-k2-0905 (+50% input) — OVERCHARGING

Registry: $0.60/M input. Actual: $0.40/M. Output accurate. Not in pipeline or presets.

Recommended action: Update input to $0.40/M.

deepseek/deepseek-v3.1-terminus (−33% input) — ABSORBING COST

Registry: $0.14/M input. Actual: $0.21/M. Output accurate. Not in pipeline or presets. Context window stale: 128K vs 164K.

Recommended action: Update input to $0.21/M.

qwen/qwen2.5-coder-7b-instruct (+67% input, −44% output) — MIXED

Registry: \(0.05/\)0.05. Actual: \(0.03/\)0.09. Not in pipeline or presets. Small absolute amounts.

Recommended action: Update to \(0.03/\)0.09. Low urgency.

Models with Moderate Drift (5–25%)

  • deepseek/deepseek-r1-0528: Input overcharged +22% ($0.55 vs $0.45). Output accurate. Not in active pipeline.
  • moonshotai/kimi-k2-thinking: Output absorbing −20% ($2.00 vs $2.50). Input accurate. Not in pipeline.
  • deepseek/deepseek-chat-v3.1: Minor mixed drift (~7% each direction). Not in pipeline.

Context Window Mismatches (Notable)

Model Registered Actual Factor
google/gemini-2.5-flash-image 1,000,000 32,768 30× over
google/gemini-3-pro-image-preview 1,000,000 65,536 15× over
openai/gpt-5 128,000 400,000 3× under
openai/gpt-5-image 128,000 400,000 3× under
openai/gpt-5-image-mini 128,000 400,000 3× under
openai/gpt-5.1 272,000 400,000 1.5× under
x-ai/grok-4.1-fast 128,000 2,000,000 15.6× under
meta-llama/llama-guard-4-12b 32,000 163,840 5× under
deepseek/deepseek-v3.2 64,000 163,840 2.6× under
deepseek/deepseek-r1-0528 64,000 163,840 2.6× under
google/gemini-3.1-flash-image-preview 131,000 65,536 2× over
x-ai/grok-code-fast-1 128,000 256,000 2× under

Note: x-ai/grok-4.1-fast is in the active pipeline (twitter_search, native_platform_search) with a context window registered at 128K but actually 2M. This doesn't affect billing but may cause unnecessary prompt truncation.

deepseek/deepseek-v3.2 is also pipeline-active (10 stages) with context window 64K vs actual 164K.


Section 3: Models Missing from OpenRouter

These 14 registered models were not found in the OpenRouter API catalog. Any code path that routes to them will fail at runtime unless caught by fallback logic.

Registry Key OpenRouter ID Deprecated? In Pipeline? Risk
claude-3.5-sonnet anthropic/claude-3.5-sonnet Yes No Low — deprecated, fallbacks exist
claude-haiku-3.5 anthropic/claude-haiku-3.5 Yes No Low — deprecated
gemini-3-pro-preview google/gemini-3-pro-preview No No Medium — active registry entry, not deprecated
gemini-2.0-flash google/gemini-2.0-flash No No Medium — active registry entry
gemini-2.0-flash-exp google/gemini-2.0-flash-exp Yes No Low — deprecated
gemini-pro-1.5 google/gemini-pro-1.5 Yes No Low — deprecated
gemini-2.5-pro-preview-06-05 google/gemini-2.5-pro-preview-06-05 Yes No Low — deprecated
grok-4.1 x-ai/grok-4.1 No No Medium — active registry entry
grok-3-fast-beta x-ai/grok-3-fast-beta Yes No Low — deprecated
llama-3.3-70b-versatile groq/llama-3.3-70b-versatile Yes No Low — deprecated
text-embedding-3-small openai/text-embedding-3-small No Yes (memory_embedding) Low — embedding models are often not listed in OpenRouter's chat model API; likely still functional via direct calls
seedream bytedance-seed/seedream-4.5 No No Medium — active entry, image model
flux2-pro black-forest-labs/flux.2-pro No No Medium — active entry, image model
glm-air zhipu/glm-4.5-air No No Medium — active entry

Key concern: 6 non-deprecated models are missing from OpenRouter. gemini-3-pro-preview, gemini-2.0-flash, grok-4.1, seedream-4.5, flux.2-pro, and glm-4.5-air should be verified manually — they may have been renamed, moved to different slugs, or removed from OpenRouter's catalog.


Section 4: Summary Statistics

Metric Value
Total models in registry 54
Found on OpenRouter 40
Missing from OpenRouter 14
Accurate pricing (±5%) 21 (52.5% of found)
Moderate drift (5–25%) 3 (7.5%)
Severe drift (>25%) 16 (40%)
Overcharging users (severe) 4 models
Platform absorbing cost (severe) 10 models
Mixed direction (severe) 2 models

Active Pipeline / Preset Impact

All 9 models in PIPELINE_ASSIGNMENTS that were found on OpenRouter have accurate pricing (within ±5%). The daily sync_model_prices Inngest function (agent_app/inngest_functions/sync_model_prices.py) syncs OpenRouter prices to the model_prices DB table, and get_model_cost() in model_registry.py checks DB prices first. This means live billing uses DB-synced prices, not hardcoded values, for any model the sync has processed.

However, the hardcoded values still matter because: 1. They are the fallback when DB prices are unavailable (cold start, DB outage, new model added between syncs) 2. They are used to build MODEL_COSTS dicts consumed by CostTracker and RateCardService for pre-execution estimates 3. The sync_model_prices function only syncs non-deprecated models, so deprecated model costs always use hardcoded values 4. The drift alerting threshold in sync_model_prices is 20% — several models exceed this and should be triggering warnings in logs

Models with Severe Drift on Live Code Paths (Priority Fixes)

Model Path Direction Max Drift
openai/gpt-5-image Image generation (openrouter provider) Absorbing −75% input
openai/gpt-5-image-mini Image generation (openrouter provider) Mixed +150% output
x-ai/grok-4 Settings UI selectable Absorbing −93%
x-ai/grok-4-fast Settings UI selectable Absorbing −75%
openai/gpt-5.1 Settings UI selectable Absorbing −50% output
openai/gpt-5 Settings UI selectable Overcharging +140% input

Revenue Impact Estimate

Without access to usage_records from the live database, precise revenue impact cannot be calculated in this audit. However:

  • Image generation models (gpt-5-image, gpt-5-image-mini) are on every image generation path. If the hardcoded fallback prices are used (rather than DB-synced prices), the platform records costs at 25% of actuals for gpt-5-image input tokens.
  • grok-4 has the largest percentage drift (−93%) but is not in default presets, limiting exposure to admin-override usage only.
  • All default preset and pipeline models have accurate pricing, which means the bulk of regular traffic is correctly priced.
  1. Immediate: Update hardcoded prices for gpt-5-image and gpt-5-image-mini — these are on live execution paths
  2. High priority: Update grok-4, grok-4-fast, gpt-5, gpt-5.1 — available in settings UI
  3. Medium priority: Update cogito-v2.1-671b, gemini-2.5-flash, gemini-2.5-flash-image, gemini-3-pro-image-preview, claude-opus-4.5
  4. Low priority: Update remaining moderate-drift models (deepseek-r1-0528, kimi-k2-thinking, etc.)
  5. Investigate: Verify whether the 6 non-deprecated missing models have been renamed on OpenRouter or should be removed/deprecated
  6. Context windows: Update stale context windows, especially grok-4.1-fast (pipeline-active, 128K→2M) and deepseek-v3.2 (pipeline-active, 64K→164K)
  7. Process: Confirm that the sync_model_prices Inngest job is running successfully and that drift alerts are being monitored