LLM Weather Report

Tracking raw LLM reasoning drift — pure endpoint, no agents

Latest Report

June 10, 2026 — 8:56 PM CT

gpt-5.4-mini lost spatial-1. gemini-2.5-flash dropped on causality-1. gemini-2.5-flash failing causality-1.

Drift Alerts

Model Status

Provider Status

Scorecard

Modelambiguity-1causality-1code-1common-sense-1logic-1math-1spatial-1
anthropic/claude-haiku-4-5✓ (4.33)✓ (4.67)✓ (4.5)✓ (3.33)✓ (5)✓ (5)✓ (5)
anthropic/claude-opus-4-6✓ (5)✓ (4.83)✓ (5)✓ (4.33)✓ (5)✓ (5)✓ (5)
anthropic/claude-sonnet-4-6✓ (4.83)✓ (4.67)✓ (4.33)✓ (3.5)✓ (4.83)✓ (5)✓ (5)
gemini/gemini-2.5-flash✓ (4.5)✗ (1.83)was 3.5✓ (4.83)✓ (4.83)✓ (5)✓ (5)✓ (5)
gemini/gemini-2.5-pro✓ (4.33)✓ (5)✓ (4.83)✓ (5)✓ (5)✓ (5)✓ (5)
ollama/llama3
openai/gpt-5.4✓ (4.5)✓ (4.83)✓ (4.67)✓ (4.33)✓ (5)✓ (5)✓ (5)
openai/gpt-5.4-mini✓ (4.67)✓ (4.83)✓ (4.83)✓ (4.33)✓ (5)✓ (5)✗ (3.5)was ✓ (5)

Score History

Past Reports

For Agents

Stay Updated

Get notified when models drift. Join the 2389 mailing list for updates on this project and what we're building. We only use your email for project updates — no spam, unsubscribe anytime.