Model Registry Pricing Audit — April 2026¶

Date: 2026-04-09 Source: agent_app/routing/model_registry.py vs OpenRouter /api/v1/models API Methodology: Registered USD/M-token costs compared against live OpenRouter pricing. Drift = (registered − actual) / actual × 100. Positive = overcharging users. Negative = platform absorbs cost.

Section 1: Summary Table¶

Sorted by maximum absolute drift descending. ✅ within ±5%, ⚠️ 5–25%, ❌ >25%.

Model Slug	Reg In $/M	Actual In $/M	In Drift %	Reg Out $/M	Actual Out $/M	Out Drift %	Status
anthropic/claude-opus-4.5	15.00	5.00	+200.0%	75.00	25.00	+200.0%	❌
openai/gpt-5-image-mini	1.25	2.50	−50.0%	5.00	2.00	+150.0%	❌
openai/gpt-5	3.00	1.25	+140.0%	12.00	10.00	+20.0%	❌
x-ai/grok-4	0.20	3.00	−93.3%	1.50	15.00	−90.0%	❌
deepcogito/cogito-v2.1-671b	0.14	1.25	−88.8%	0.80	1.25	−36.0%	❌
google/gemini-2.5-flash-image	0.10	0.30	−66.7%	0.40	2.50	−84.0%	❌
x-ai/grok-code-fast-1	0.05	0.20	−75.0%	0.25	1.50	−83.3%	❌
google/gemini-2.5-flash	0.15	0.30	−50.0%	0.60	2.50	−76.0%	❌
openai/gpt-5-image	2.50	10.00	−75.0%	10.00	10.00	0.0%	❌
x-ai/grok-4-fast	0.05	0.20	−75.0%	0.25	0.50	−50.0%	❌
qwen/qwen2.5-coder-7b-instruct	0.05	0.03	+66.7%	0.05	0.09	−44.4%	❌
google/gemini-3-pro-image-preview	1.25	2.00	−37.5%	5.00	12.00	−58.3%	❌
moonshotai/kimi-k2-0905	0.60	0.40	+50.0%	2.00	2.00	0.0%	❌
openai/gpt-5.1	1.25	1.25	0.0%	5.00	10.00	−50.0%	❌
deepseek/deepseek-v3.1-terminus	0.14	0.21	−33.3%	0.80	0.79	+1.3%	❌
deepseek/deepseek-chat-v3-0324	0.14	0.20	−30.0%	0.80	0.77	+3.9%	❌
deepseek/deepseek-r1-0528	0.55	0.45	+22.2%	2.19	2.15	+1.9%	⚠️
moonshotai/kimi-k2-thinking	0.60	0.60	0.0%	2.00	2.50	−20.0%	⚠️
deepseek/deepseek-chat-v3.1	0.14	0.15	−6.7%	0.80	0.75	+6.7%	⚠️
deepseek/deepseek-v3.2	0.25	0.26	−3.8%	0.38	0.38	0.0%	✅
anthropic/claude-opus-4.6	5.00	5.00	0.0%	25.00	25.00	0.0%	✅
anthropic/claude-sonnet-4.6	3.00	3.00	0.0%	15.00	15.00	0.0%	✅
anthropic/claude-sonnet-4.5	3.00	3.00	0.0%	15.00	15.00	0.0%	✅
anthropic/claude-haiku-4.5	1.00	1.00	0.0%	5.00	5.00	0.0%	✅
anthropic/claude-sonnet-4	3.00	3.00	0.0%	15.00	15.00	0.0%	✅
anthropic/claude-opus-4	15.00	15.00	0.0%	75.00	75.00	0.0%	✅
anthropic/claude-3-haiku	0.25	0.25	0.0%	1.25	1.25	0.0%	✅
google/gemini-3.1-pro-preview	2.00	2.00	0.0%	12.00	12.00	0.0%	✅
google/gemini-3-flash-preview	0.50	0.50	0.0%	3.00	3.00	0.0%	✅
google/gemini-2.5-flash-lite	0.10	0.10	0.0%	0.40	0.40	0.0%	✅
google/gemini-2.0-flash-001	0.10	0.10	0.0%	0.40	0.40	0.0%	✅
google/gemini-3.1-flash-image-preview	0.50	0.50	0.0%	3.00	3.00	0.0%	✅
x-ai/grok-4.1-fast	0.20	0.20	0.0%	0.50	0.50	0.0%	✅
meta-llama/llama-4-maverick	0.15	0.15	0.0%	0.60	0.60	0.0%	✅
meta-llama/llama-guard-4-12b	0.18	0.18	0.0%	0.18	0.18	0.0%	✅
openai/gpt-5.4	2.50	2.50	0.0%	15.00	15.00	0.0%	✅
openai/o3	2.00	2.00	0.0%	8.00	8.00	0.0%	✅
openai/gpt-4o	2.50	2.50	0.0%	10.00	10.00	0.0%	✅
openai/gpt-3.5-turbo	0.50	0.50	0.0%	1.50	1.50	0.0%	✅
xiaomi/mimo-v2-flash	0.09	0.09	0.0%	0.29	0.29	0.0%	✅

Section 2: Per-Model Notes¶

URGENT — Models on Active Code Paths with Severe Drift¶

openai/gpt-5-image (−75% input drift) — ABSORBING COST¶

Registry lists $2.50/M input but OpenRouter charges $10.00/M. Used in agent_app/virtual_run/image_providers/openrouter.py as the primary OpenRouter image generation model. Every image generation call costs the platform 4× what is recorded. Context window also stale: registered 128K vs actual 400K.

Recommended action: Update input cost to $10.00/M immediately. This is a live execution path.

openai/gpt-5-image-mini (−50% input, +150% output) — MIXED¶

Registry: $1.25/$5.00. Actual: $2.50/$2.00. Used as the secondary image model in image_providers/openrouter.py. Input is under-recorded (platform absorbs), output is over-recorded (users overcharged on output). Net effect depends on input/output ratio for image calls.

Recommended action: Update to $2.50/$2.00. This is a live execution path.

x-ai/grok-4 (−93% input, −90% output) — ABSORBING COST¶

Registry: $0.20/$1.50. Actual: $3.00/$15.00. Listed in settings UI (model_presets.py:732) and model_config.py:169 as an available model. Not in default presets or pipeline, but if an admin selects it, the platform absorbs ~93% of the true cost. The registered price appears to be from the original Grok 3/early Grok 4 era before a major price increase.

Recommended action: Update to $3.00/$15.00 or remove from settings UI if no longer cost-competitive.

x-ai/grok-4-fast (−75% input, −50% output) — ABSORBING COST¶

Registry: $0.05/$0.25. Actual: $0.20/$0.50. Listed in settings UI (model_presets.py:708) and model_config.py:155. Same pattern as grok-4: early pricing never updated.

Recommended action: Update to $0.20/$0.50.

openai/gpt-5.1 (−50% output drift) — ABSORBING COST¶

Registry: $5.00/M output. Actual: $10.00/M output. Input is accurate ($1.25). Listed in settings UI (model_presets.py:454). Context window also stale: registered 272K vs actual 400K. Platform absorbs half the output cost if a user selects this model.

Recommended action: Update output cost to $10.00/M and context window to 400K.

openai/gpt-5 (+140% input drift) — OVERCHARGING USERS¶

Registry: $3.00/M input. Actual: $1.25/M input. Output moderate: registered $12.00 vs actual $10.00 (+20%). Not in any default preset or active pipeline stage. Referenced in verify_gpt52_model.py (test file) and legacy code only. Context window stale: 128K vs 400K.

Recommended action: Update to $1.25/$10.00 or mark deprecated if not needed.

Non-Pipeline Models with Severe Drift¶

anthropic/claude-opus-4.5 (+200% both directions) — OVERCHARGING¶

Registry: $15.00/$75.00. Actual: $5.00/$25.00. Marked deprecated=True. The registry entry carries the old Claude Opus 4 pricing ($15/$75) but Anthropic priced Opus 4.5 at $5/$25. Any code path still referencing this model records 3× the true cost. Not in pipeline or presets.

Recommended action: Update to $5.00/$25.00 for historical accuracy of cost records, even though deprecated.

deepcogito/cogito-v2.1-671b (−89% input, −36% output) — ABSORBING COST¶

Registry: $0.14/$0.80. Actual: $1.25/$1.25. Not in pipeline or presets. The registered price appears copied from DeepSeek V3 pricing, not the actual Cogito model price. If selected via admin override, the platform absorbs ~89% of input cost.

Recommended action: Update to $1.25/$1.25.

google/gemini-2.5-flash (−50% input, −76% output) — ABSORBING COST¶

Registry: $0.15/$0.60. Actual: $0.30/$2.50. Not in active pipeline (superseded by Gemini 3 variants). Referenced in capability_requirements.py and legacy chat app. The output price is particularly stale — registered at $0.60 vs actual $2.50 (4.2× under).

Recommended action: Update to $0.30/$2.50 or mark deprecated.

google/gemini-2.5-flash-image (−67% input, −84% output) — ABSORBING COST¶

Registry: $0.10/$0.40. Actual: $0.30/$2.50. Context window severely stale: registered 1M vs actual 32K. The pricing was likely copied from the non-image Gemini 2.0 Flash entry.

Recommended action: Update pricing to $0.30/$2.50, context window to 32,768.

google/gemini-3-pro-image-preview (−38% input, −58% output) — ABSORBING COST¶

Registry: $1.25/$5.00. Actual: $2.00/$12.00. Context window stale: registered 1M vs actual 65K. Used as an image model option.

Recommended action: Update to $2.00/$12.00, context window to 65,536.

x-ai/grok-code-fast-1 (−75% input, −83% output) — ABSORBING COST¶

Registry: $0.05/$0.25. Actual: $0.20/$1.50. Not in pipeline or presets. Context window stale: 128K vs 256K.

Recommended action: Update to $0.20/$1.50.

moonshotai/kimi-k2-0905 (+50% input) — OVERCHARGING¶

Registry: $0.60/M input. Actual: $0.40/M. Output accurate. Not in pipeline or presets.

Recommended action: Update input to $0.40/M.

deepseek/deepseek-v3.1-terminus (−33% input) — ABSORBING COST¶

Registry: $0.14/M input. Actual: $0.21/M. Output accurate. Not in pipeline or presets. Context window stale: 128K vs 164K.

Recommended action: Update input to $0.21/M.

qwen/qwen2.5-coder-7b-instruct (+67% input, −44% output) — MIXED¶

Registry: $0.05/$0.05. Actual: $0.03/$0.09. Not in pipeline or presets. Small absolute amounts.

Recommended action: Update to $0.03/$0.09. Low urgency.

Models with Moderate Drift (5–25%)¶

deepseek/deepseek-r1-0528: Input overcharged +22% ($0.55 vs $0.45). Output accurate. Not in active pipeline.
moonshotai/kimi-k2-thinking: Output absorbing −20% ($2.00 vs $2.50). Input accurate. Not in pipeline.
deepseek/deepseek-chat-v3.1: Minor mixed drift (~7% each direction). Not in pipeline.

Context Window Mismatches (Notable)¶

Model	Registered	Actual	Factor
google/gemini-2.5-flash-image	1,000,000	32,768	30× over
google/gemini-3-pro-image-preview	1,000,000	65,536	15× over
openai/gpt-5	128,000	400,000	3× under
openai/gpt-5-image	128,000	400,000	3× under
openai/gpt-5-image-mini	128,000	400,000	3× under
openai/gpt-5.1	272,000	400,000	1.5× under
x-ai/grok-4.1-fast	128,000	2,000,000	15.6× under
meta-llama/llama-guard-4-12b	32,000	163,840	5× under
deepseek/deepseek-v3.2	64,000	163,840	2.6× under
deepseek/deepseek-r1-0528	64,000	163,840	2.6× under
google/gemini-3.1-flash-image-preview	131,000	65,536	2× over
x-ai/grok-code-fast-1	128,000	256,000	2× under

Note: x-ai/grok-4.1-fast is in the active pipeline (twitter_search, native_platform_search) with a context window registered at 128K but actually 2M. This doesn't affect billing but may cause unnecessary prompt truncation.

deepseek/deepseek-v3.2 is also pipeline-active (10 stages) with context window 64K vs actual 164K.

Section 3: Models Missing from OpenRouter¶

These 14 registered models were not found in the OpenRouter API catalog. Any code path that routes to them will fail at runtime unless caught by fallback logic.

Registry Key	OpenRouter ID	Deprecated?	In Pipeline?	Risk
claude-3.5-sonnet	anthropic/claude-3.5-sonnet	Yes	No	Low — deprecated, fallbacks exist
claude-haiku-3.5	anthropic/claude-haiku-3.5	Yes	No	Low — deprecated
gemini-3-pro-preview	google/gemini-3-pro-preview	No	No	Medium — active registry entry, not deprecated
gemini-2.0-flash	google/gemini-2.0-flash	No	No	Medium — active registry entry
gemini-2.0-flash-exp	google/gemini-2.0-flash-exp	Yes	No	Low — deprecated
gemini-pro-1.5	google/gemini-pro-1.5	Yes	No	Low — deprecated
gemini-2.5-pro-preview-06-05	google/gemini-2.5-pro-preview-06-05	Yes	No	Low — deprecated
grok-4.1	x-ai/grok-4.1	No	No	Medium — active registry entry
grok-3-fast-beta	x-ai/grok-3-fast-beta	Yes	No	Low — deprecated
llama-3.3-70b-versatile	groq/llama-3.3-70b-versatile	Yes	No	Low — deprecated
text-embedding-3-small	openai/text-embedding-3-small	No	Yes (memory_embedding)	Low — embedding models are often not listed in OpenRouter's chat model API; likely still functional via direct calls
seedream	bytedance-seed/seedream-4.5	No	No	Medium — active entry, image model
flux2-pro	black-forest-labs/flux.2-pro	No	No	Medium — active entry, image model
glm-air	zhipu/glm-4.5-air	No	No	Medium — active entry

Key concern: 6 non-deprecated models are missing from OpenRouter. gemini-3-pro-preview, gemini-2.0-flash, grok-4.1, seedream-4.5, flux.2-pro, and glm-4.5-air should be verified manually — they may have been renamed, moved to different slugs, or removed from OpenRouter's catalog.

Section 4: Summary Statistics¶

Metric	Value
Total models in registry	54
Found on OpenRouter	40
Missing from OpenRouter	14
Accurate pricing (±5%)	21 (52.5% of found)
Moderate drift (5–25%)	3 (7.5%)
Severe drift (>25%)	16 (40%)
Overcharging users (severe)	4 models
Platform absorbing cost (severe)	10 models
Mixed direction (severe)	2 models

Active Pipeline / Preset Impact¶

All 9 models in PIPELINE_ASSIGNMENTS that were found on OpenRouter have accurate pricing (within ±5%). The daily sync_model_prices Inngest function (agent_app/inngest_functions/sync_model_prices.py) syncs OpenRouter prices to the model_prices DB table, and get_model_cost() in model_registry.py checks DB prices first. This means live billing uses DB-synced prices, not hardcoded values, for any model the sync has processed.

However, the hardcoded values still matter because: 1. They are the fallback when DB prices are unavailable (cold start, DB outage, new model added between syncs) 2. They are used to build MODEL_COSTS dicts consumed by CostTracker and RateCardService for pre-execution estimates 3. The sync_model_prices function only syncs non-deprecated models, so deprecated model costs always use hardcoded values 4. The drift alerting threshold in sync_model_prices is 20% — several models exceed this and should be triggering warnings in logs

Models with Severe Drift on Live Code Paths (Priority Fixes)¶

Model	Path	Direction	Max Drift
openai/gpt-5-image	Image generation (openrouter provider)	Absorbing	−75% input
openai/gpt-5-image-mini	Image generation (openrouter provider)	Mixed	+150% output
x-ai/grok-4	Settings UI selectable	Absorbing	−93%
x-ai/grok-4-fast	Settings UI selectable	Absorbing	−75%
openai/gpt-5.1	Settings UI selectable	Absorbing	−50% output
openai/gpt-5	Settings UI selectable	Overcharging	+140% input

Revenue Impact Estimate¶

Without access to usage_records from the live database, precise revenue impact cannot be calculated in this audit. However:

Image generation models (gpt-5-image, gpt-5-image-mini) are on every image generation path. If the hardcoded fallback prices are used (rather than DB-synced prices), the platform records costs at 25% of actuals for gpt-5-image input tokens.
grok-4 has the largest percentage drift (−93%) but is not in default presets, limiting exposure to admin-override usage only.
All default preset and pipeline models have accurate pricing, which means the bulk of regular traffic is correctly priced.

Recommended Follow-Up Actions¶

Immediate: Update hardcoded prices for gpt-5-image and gpt-5-image-mini — these are on live execution paths
High priority: Update grok-4, grok-4-fast, gpt-5, gpt-5.1 — available in settings UI
Medium priority: Update cogito-v2.1-671b, gemini-2.5-flash, gemini-2.5-flash-image, gemini-3-pro-image-preview, claude-opus-4.5
Low priority: Update remaining moderate-drift models (deepseek-r1-0528, kimi-k2-thinking, etc.)
Investigate: Verify whether the 6 non-deprecated missing models have been renamed on OpenRouter or should be removed/deprecated
Context windows: Update stale context windows, especially grok-4.1-fast (pipeline-active, 128K→2M) and deepseek-v3.2 (pipeline-active, 64K→164K)
Process: Confirm that the sync_model_prices Inngest job is running successfully and that drift alerts are being monitored