Model Registry Pricing Audit — April 2026¶
Date: 2026-04-09 Source: agent_app/routing/model_registry.py vs OpenRouter /api/v1/models API Methodology: Registered USD/M-token costs compared against live OpenRouter pricing. Drift = (registered − actual) / actual × 100. Positive = overcharging users. Negative = platform absorbs cost.
Section 1: Summary Table¶
Sorted by maximum absolute drift descending. ✅ within ±5%, ⚠️ 5–25%, ❌ >25%.
| Model Slug | Reg In $/M | Actual In $/M | In Drift % | Reg Out $/M | Actual Out $/M | Out Drift % | Status |
|---|---|---|---|---|---|---|---|
| anthropic/claude-opus-4.5 | 15.00 | 5.00 | +200.0% | 75.00 | 25.00 | +200.0% | ❌ |
| openai/gpt-5-image-mini | 1.25 | 2.50 | −50.0% | 5.00 | 2.00 | +150.0% | ❌ |
| openai/gpt-5 | 3.00 | 1.25 | +140.0% | 12.00 | 10.00 | +20.0% | ❌ |
| x-ai/grok-4 | 0.20 | 3.00 | −93.3% | 1.50 | 15.00 | −90.0% | ❌ |
| deepcogito/cogito-v2.1-671b | 0.14 | 1.25 | −88.8% | 0.80 | 1.25 | −36.0% | ❌ |
| google/gemini-2.5-flash-image | 0.10 | 0.30 | −66.7% | 0.40 | 2.50 | −84.0% | ❌ |
| x-ai/grok-code-fast-1 | 0.05 | 0.20 | −75.0% | 0.25 | 1.50 | −83.3% | ❌ |
| google/gemini-2.5-flash | 0.15 | 0.30 | −50.0% | 0.60 | 2.50 | −76.0% | ❌ |
| openai/gpt-5-image | 2.50 | 10.00 | −75.0% | 10.00 | 10.00 | 0.0% | ❌ |
| x-ai/grok-4-fast | 0.05 | 0.20 | −75.0% | 0.25 | 0.50 | −50.0% | ❌ |
| qwen/qwen2.5-coder-7b-instruct | 0.05 | 0.03 | +66.7% | 0.05 | 0.09 | −44.4% | ❌ |
| google/gemini-3-pro-image-preview | 1.25 | 2.00 | −37.5% | 5.00 | 12.00 | −58.3% | ❌ |
| moonshotai/kimi-k2-0905 | 0.60 | 0.40 | +50.0% | 2.00 | 2.00 | 0.0% | ❌ |
| openai/gpt-5.1 | 1.25 | 1.25 | 0.0% | 5.00 | 10.00 | −50.0% | ❌ |
| deepseek/deepseek-v3.1-terminus | 0.14 | 0.21 | −33.3% | 0.80 | 0.79 | +1.3% | ❌ |
| deepseek/deepseek-chat-v3-0324 | 0.14 | 0.20 | −30.0% | 0.80 | 0.77 | +3.9% | ❌ |
| deepseek/deepseek-r1-0528 | 0.55 | 0.45 | +22.2% | 2.19 | 2.15 | +1.9% | ⚠️ |
| moonshotai/kimi-k2-thinking | 0.60 | 0.60 | 0.0% | 2.00 | 2.50 | −20.0% | ⚠️ |
| deepseek/deepseek-chat-v3.1 | 0.14 | 0.15 | −6.7% | 0.80 | 0.75 | +6.7% | ⚠️ |
| deepseek/deepseek-v3.2 | 0.25 | 0.26 | −3.8% | 0.38 | 0.38 | 0.0% | ✅ |
| anthropic/claude-opus-4.6 | 5.00 | 5.00 | 0.0% | 25.00 | 25.00 | 0.0% | ✅ |
| anthropic/claude-sonnet-4.6 | 3.00 | 3.00 | 0.0% | 15.00 | 15.00 | 0.0% | ✅ |
| anthropic/claude-sonnet-4.5 | 3.00 | 3.00 | 0.0% | 15.00 | 15.00 | 0.0% | ✅ |
| anthropic/claude-haiku-4.5 | 1.00 | 1.00 | 0.0% | 5.00 | 5.00 | 0.0% | ✅ |
| anthropic/claude-sonnet-4 | 3.00 | 3.00 | 0.0% | 15.00 | 15.00 | 0.0% | ✅ |
| anthropic/claude-opus-4 | 15.00 | 15.00 | 0.0% | 75.00 | 75.00 | 0.0% | ✅ |
| anthropic/claude-3-haiku | 0.25 | 0.25 | 0.0% | 1.25 | 1.25 | 0.0% | ✅ |
| google/gemini-3.1-pro-preview | 2.00 | 2.00 | 0.0% | 12.00 | 12.00 | 0.0% | ✅ |
| google/gemini-3-flash-preview | 0.50 | 0.50 | 0.0% | 3.00 | 3.00 | 0.0% | ✅ |
| google/gemini-2.5-flash-lite | 0.10 | 0.10 | 0.0% | 0.40 | 0.40 | 0.0% | ✅ |
| google/gemini-2.0-flash-001 | 0.10 | 0.10 | 0.0% | 0.40 | 0.40 | 0.0% | ✅ |
| google/gemini-3.1-flash-image-preview | 0.50 | 0.50 | 0.0% | 3.00 | 3.00 | 0.0% | ✅ |
| x-ai/grok-4.1-fast | 0.20 | 0.20 | 0.0% | 0.50 | 0.50 | 0.0% | ✅ |
| meta-llama/llama-4-maverick | 0.15 | 0.15 | 0.0% | 0.60 | 0.60 | 0.0% | ✅ |
| meta-llama/llama-guard-4-12b | 0.18 | 0.18 | 0.0% | 0.18 | 0.18 | 0.0% | ✅ |
| openai/gpt-5.4 | 2.50 | 2.50 | 0.0% | 15.00 | 15.00 | 0.0% | ✅ |
| openai/o3 | 2.00 | 2.00 | 0.0% | 8.00 | 8.00 | 0.0% | ✅ |
| openai/gpt-4o | 2.50 | 2.50 | 0.0% | 10.00 | 10.00 | 0.0% | ✅ |
| openai/gpt-3.5-turbo | 0.50 | 0.50 | 0.0% | 1.50 | 1.50 | 0.0% | ✅ |
| xiaomi/mimo-v2-flash | 0.09 | 0.09 | 0.0% | 0.29 | 0.29 | 0.0% | ✅ |
Section 2: Per-Model Notes¶
URGENT — Models on Active Code Paths with Severe Drift¶
openai/gpt-5-image (−75% input drift) — ABSORBING COST¶
Registry lists $2.50/M input but OpenRouter charges $10.00/M. Used in agent_app/virtual_run/image_providers/openrouter.py as the primary OpenRouter image generation model. Every image generation call costs the platform 4× what is recorded. Context window also stale: registered 128K vs actual 400K.
Recommended action: Update input cost to $10.00/M immediately. This is a live execution path.
openai/gpt-5-image-mini (−50% input, +150% output) — MIXED¶
Registry: \(1.25/\)5.00. Actual: \(2.50/\)2.00. Used as the secondary image model in image_providers/openrouter.py. Input is under-recorded (platform absorbs), output is over-recorded (users overcharged on output). Net effect depends on input/output ratio for image calls.
Recommended action: Update to \(2.50/\)2.00. This is a live execution path.
x-ai/grok-4 (−93% input, −90% output) — ABSORBING COST¶
Registry: \(0.20/\)1.50. Actual: \(3.00/\)15.00. Listed in settings UI (model_presets.py:732) and model_config.py:169 as an available model. Not in default presets or pipeline, but if an admin selects it, the platform absorbs ~93% of the true cost. The registered price appears to be from the original Grok 3/early Grok 4 era before a major price increase.
Recommended action: Update to \(3.00/\)15.00 or remove from settings UI if no longer cost-competitive.
x-ai/grok-4-fast (−75% input, −50% output) — ABSORBING COST¶
Registry: \(0.05/\)0.25. Actual: \(0.20/\)0.50. Listed in settings UI (model_presets.py:708) and model_config.py:155. Same pattern as grok-4: early pricing never updated.
Recommended action: Update to \(0.20/\)0.50.
openai/gpt-5.1 (−50% output drift) — ABSORBING COST¶
Registry: $5.00/M output. Actual: \(10.00/M output. Input is accurate (\)1.25). Listed in settings UI (model_presets.py:454). Context window also stale: registered 272K vs actual 400K. Platform absorbs half the output cost if a user selects this model.
Recommended action: Update output cost to $10.00/M and context window to 400K.
openai/gpt-5 (+140% input drift) — OVERCHARGING USERS¶
Registry: $3.00/M input. Actual: $1.25/M input. Output moderate: registered $12.00 vs actual $10.00 (+20%). Not in any default preset or active pipeline stage. Referenced in verify_gpt52_model.py (test file) and legacy code only. Context window stale: 128K vs 400K.
Recommended action: Update to \(1.25/\)10.00 or mark deprecated if not needed.
Non-Pipeline Models with Severe Drift¶
anthropic/claude-opus-4.5 (+200% both directions) — OVERCHARGING¶
Registry: \(15.00/\)75.00. Actual: \(5.00/\)25.00. Marked deprecated=True. The registry entry carries the old Claude Opus 4 pricing (\(15/\)75) but Anthropic priced Opus 4.5 at \(5/\)25. Any code path still referencing this model records 3× the true cost. Not in pipeline or presets.
Recommended action: Update to \(5.00/\)25.00 for historical accuracy of cost records, even though deprecated.
deepcogito/cogito-v2.1-671b (−89% input, −36% output) — ABSORBING COST¶
Registry: \(0.14/\)0.80. Actual: \(1.25/\)1.25. Not in pipeline or presets. The registered price appears copied from DeepSeek V3 pricing, not the actual Cogito model price. If selected via admin override, the platform absorbs ~89% of input cost.
Recommended action: Update to \(1.25/\)1.25.
google/gemini-2.5-flash (−50% input, −76% output) — ABSORBING COST¶
Registry: \(0.15/\)0.60. Actual: \(0.30/\)2.50. Not in active pipeline (superseded by Gemini 3 variants). Referenced in capability_requirements.py and legacy chat app. The output price is particularly stale — registered at $0.60 vs actual $2.50 (4.2× under).
Recommended action: Update to \(0.30/\)2.50 or mark deprecated.
google/gemini-2.5-flash-image (−67% input, −84% output) — ABSORBING COST¶
Registry: \(0.10/\)0.40. Actual: \(0.30/\)2.50. Context window severely stale: registered 1M vs actual 32K. The pricing was likely copied from the non-image Gemini 2.0 Flash entry.
Recommended action: Update pricing to \(0.30/\)2.50, context window to 32,768.
google/gemini-3-pro-image-preview (−38% input, −58% output) — ABSORBING COST¶
Registry: \(1.25/\)5.00. Actual: \(2.00/\)12.00. Context window stale: registered 1M vs actual 65K. Used as an image model option.
Recommended action: Update to \(2.00/\)12.00, context window to 65,536.
x-ai/grok-code-fast-1 (−75% input, −83% output) — ABSORBING COST¶
Registry: \(0.05/\)0.25. Actual: \(0.20/\)1.50. Not in pipeline or presets. Context window stale: 128K vs 256K.
Recommended action: Update to \(0.20/\)1.50.
moonshotai/kimi-k2-0905 (+50% input) — OVERCHARGING¶
Registry: $0.60/M input. Actual: $0.40/M. Output accurate. Not in pipeline or presets.
Recommended action: Update input to $0.40/M.
deepseek/deepseek-v3.1-terminus (−33% input) — ABSORBING COST¶
Registry: $0.14/M input. Actual: $0.21/M. Output accurate. Not in pipeline or presets. Context window stale: 128K vs 164K.
Recommended action: Update input to $0.21/M.
qwen/qwen2.5-coder-7b-instruct (+67% input, −44% output) — MIXED¶
Registry: \(0.05/\)0.05. Actual: \(0.03/\)0.09. Not in pipeline or presets. Small absolute amounts.
Recommended action: Update to \(0.03/\)0.09. Low urgency.
Models with Moderate Drift (5–25%)¶
- deepseek/deepseek-r1-0528: Input overcharged +22% ($0.55 vs $0.45). Output accurate. Not in active pipeline.
- moonshotai/kimi-k2-thinking: Output absorbing −20% ($2.00 vs $2.50). Input accurate. Not in pipeline.
- deepseek/deepseek-chat-v3.1: Minor mixed drift (~7% each direction). Not in pipeline.
Context Window Mismatches (Notable)¶
| Model | Registered | Actual | Factor |
|---|---|---|---|
| google/gemini-2.5-flash-image | 1,000,000 | 32,768 | 30× over |
| google/gemini-3-pro-image-preview | 1,000,000 | 65,536 | 15× over |
| openai/gpt-5 | 128,000 | 400,000 | 3× under |
| openai/gpt-5-image | 128,000 | 400,000 | 3× under |
| openai/gpt-5-image-mini | 128,000 | 400,000 | 3× under |
| openai/gpt-5.1 | 272,000 | 400,000 | 1.5× under |
| x-ai/grok-4.1-fast | 128,000 | 2,000,000 | 15.6× under |
| meta-llama/llama-guard-4-12b | 32,000 | 163,840 | 5× under |
| deepseek/deepseek-v3.2 | 64,000 | 163,840 | 2.6× under |
| deepseek/deepseek-r1-0528 | 64,000 | 163,840 | 2.6× under |
| google/gemini-3.1-flash-image-preview | 131,000 | 65,536 | 2× over |
| x-ai/grok-code-fast-1 | 128,000 | 256,000 | 2× under |
Note: x-ai/grok-4.1-fast is in the active pipeline (twitter_search, native_platform_search) with a context window registered at 128K but actually 2M. This doesn't affect billing but may cause unnecessary prompt truncation.
deepseek/deepseek-v3.2 is also pipeline-active (10 stages) with context window 64K vs actual 164K.
Section 3: Models Missing from OpenRouter¶
These 14 registered models were not found in the OpenRouter API catalog. Any code path that routes to them will fail at runtime unless caught by fallback logic.
| Registry Key | OpenRouter ID | Deprecated? | In Pipeline? | Risk |
|---|---|---|---|---|
| claude-3.5-sonnet | anthropic/claude-3.5-sonnet | Yes | No | Low — deprecated, fallbacks exist |
| claude-haiku-3.5 | anthropic/claude-haiku-3.5 | Yes | No | Low — deprecated |
| gemini-3-pro-preview | google/gemini-3-pro-preview | No | No | Medium — active registry entry, not deprecated |
| gemini-2.0-flash | google/gemini-2.0-flash | No | No | Medium — active registry entry |
| gemini-2.0-flash-exp | google/gemini-2.0-flash-exp | Yes | No | Low — deprecated |
| gemini-pro-1.5 | google/gemini-pro-1.5 | Yes | No | Low — deprecated |
| gemini-2.5-pro-preview-06-05 | google/gemini-2.5-pro-preview-06-05 | Yes | No | Low — deprecated |
| grok-4.1 | x-ai/grok-4.1 | No | No | Medium — active registry entry |
| grok-3-fast-beta | x-ai/grok-3-fast-beta | Yes | No | Low — deprecated |
| llama-3.3-70b-versatile | groq/llama-3.3-70b-versatile | Yes | No | Low — deprecated |
| text-embedding-3-small | openai/text-embedding-3-small | No | Yes (memory_embedding) | Low — embedding models are often not listed in OpenRouter's chat model API; likely still functional via direct calls |
| seedream | bytedance-seed/seedream-4.5 | No | No | Medium — active entry, image model |
| flux2-pro | black-forest-labs/flux.2-pro | No | No | Medium — active entry, image model |
| glm-air | zhipu/glm-4.5-air | No | No | Medium — active entry |
Key concern: 6 non-deprecated models are missing from OpenRouter. gemini-3-pro-preview, gemini-2.0-flash, grok-4.1, seedream-4.5, flux.2-pro, and glm-4.5-air should be verified manually — they may have been renamed, moved to different slugs, or removed from OpenRouter's catalog.
Section 4: Summary Statistics¶
| Metric | Value |
|---|---|
| Total models in registry | 54 |
| Found on OpenRouter | 40 |
| Missing from OpenRouter | 14 |
| Accurate pricing (±5%) | 21 (52.5% of found) |
| Moderate drift (5–25%) | 3 (7.5%) |
| Severe drift (>25%) | 16 (40%) |
| Overcharging users (severe) | 4 models |
| Platform absorbing cost (severe) | 10 models |
| Mixed direction (severe) | 2 models |
Active Pipeline / Preset Impact¶
All 9 models in PIPELINE_ASSIGNMENTS that were found on OpenRouter have accurate pricing (within ±5%). The daily sync_model_prices Inngest function (agent_app/inngest_functions/sync_model_prices.py) syncs OpenRouter prices to the model_prices DB table, and get_model_cost() in model_registry.py checks DB prices first. This means live billing uses DB-synced prices, not hardcoded values, for any model the sync has processed.
However, the hardcoded values still matter because: 1. They are the fallback when DB prices are unavailable (cold start, DB outage, new model added between syncs) 2. They are used to build MODEL_COSTS dicts consumed by CostTracker and RateCardService for pre-execution estimates 3. The sync_model_prices function only syncs non-deprecated models, so deprecated model costs always use hardcoded values 4. The drift alerting threshold in sync_model_prices is 20% — several models exceed this and should be triggering warnings in logs
Models with Severe Drift on Live Code Paths (Priority Fixes)¶
| Model | Path | Direction | Max Drift |
|---|---|---|---|
| openai/gpt-5-image | Image generation (openrouter provider) | Absorbing | −75% input |
| openai/gpt-5-image-mini | Image generation (openrouter provider) | Mixed | +150% output |
| x-ai/grok-4 | Settings UI selectable | Absorbing | −93% |
| x-ai/grok-4-fast | Settings UI selectable | Absorbing | −75% |
| openai/gpt-5.1 | Settings UI selectable | Absorbing | −50% output |
| openai/gpt-5 | Settings UI selectable | Overcharging | +140% input |
Revenue Impact Estimate¶
Without access to usage_records from the live database, precise revenue impact cannot be calculated in this audit. However:
- Image generation models (gpt-5-image, gpt-5-image-mini) are on every image generation path. If the hardcoded fallback prices are used (rather than DB-synced prices), the platform records costs at 25% of actuals for gpt-5-image input tokens.
- grok-4 has the largest percentage drift (−93%) but is not in default presets, limiting exposure to admin-override usage only.
- All default preset and pipeline models have accurate pricing, which means the bulk of regular traffic is correctly priced.
Recommended Follow-Up Actions¶
- Immediate: Update hardcoded prices for
gpt-5-imageandgpt-5-image-mini— these are on live execution paths - High priority: Update
grok-4,grok-4-fast,gpt-5,gpt-5.1— available in settings UI - Medium priority: Update
cogito-v2.1-671b,gemini-2.5-flash,gemini-2.5-flash-image,gemini-3-pro-image-preview,claude-opus-4.5 - Low priority: Update remaining moderate-drift models (
deepseek-r1-0528,kimi-k2-thinking, etc.) - Investigate: Verify whether the 6 non-deprecated missing models have been renamed on OpenRouter or should be removed/deprecated
- Context windows: Update stale context windows, especially
grok-4.1-fast(pipeline-active, 128K→2M) anddeepseek-v3.2(pipeline-active, 64K→164K) - Process: Confirm that the
sync_model_pricesInngest job is running successfully and that drift alerts are being monitored