LLM Weather Report

Tracking raw LLM reasoning drift — pure endpoint, no agents

gemini-2.5-pro lost causality-1. gpt-5.4-mini failing spatial-1. gemini-2.5-pro, gemini-2.5-flash recovering. gpt-5.4-mini scores rising.

May 1, 2026 — 8:36 AM CT

Drift Alerts

SCORE_RISE openai/gpt-5.4-mini on spatial-1
REGRESSION gemini/gemini-2.5-pro on causality-1
IMPROVEMENT gemini/gemini-2.5-pro on common-sense-1
IMPROVEMENT gemini/gemini-2.5-flash on causality-1

Provider Status

OpenAI Elevated error rates for image generation
OpenAI Elevated error rates affecting ChatGPT for some users in Europe
OpenAI Elevated errors for ChatGPT Go (5.3 Thinking)
OpenAI Partial Disruption of ChatGPT Workspace Connector Write Actions
Anthropic Elevated errors on Claude Haiku 4.5
Anthropic claude.ai and API unavailable
Anthropic Elevated errors on Claude Haiku 4.5

Scorecard

Model	ambiguity-1	causality-1	code-1	common-sense-1	logic-1	math-1	spatial-1
anthropic/claude-haiku-4-5	✓ (4.5)	✓ (4.67)	✓ (4.83)	✓ (3.33)	✓ (5)	✓ (5)	✓ (5)
anthropic/claude-opus-4-6	✓ (5)	✓ (4.67)	✓ (4.83)	✓ (4.5)	✓ (5)	✓ (5)	✓ (5)
anthropic/claude-sonnet-4-6	✓ (4.33)	✓ (4.8)	✓ (4.5)	✓ (4)	✓ (5)	✓ (5)	✓ (5)
gemini/gemini-2.5-flash	✓ (4.5)	✓ (3.5)was ✗ (2)	✓ (4.67)	✓ (5)	✓ (4.83)	✓ (5)	✓ (5)
gemini/gemini-2.5-pro	✓ (4.67)	—	✓ (4.67)	✓ (5)was ✗ ()	✓ (5)	✓ (5)	✓ (5)
ollama/llama3	—	—	—	—	—	—	—
openai/gpt-5.4	✓ (4.5)	✓ (4.67)	✓ (5)	✓ (4.33)	✓ (5)	✓ (4.6)	✓ (5)
openai/gpt-5.4-mini	✓ (4.5)	✓ (4.67)	✓ (4.67)	✓ (4.33)	✓ (4.83)	✓ (5)	✗ (3.67)was 2.5

Model Status

→ anthropic/claude-haiku-4-5 stable
→ anthropic/claude-opus-4-6 stable
→ anthropic/claude-sonnet-4-6 stable
↑ gemini/gemini-2.5-flash up
↓ gemini/gemini-2.5-pro down
→ openai/gpt-5.4 stable
↑ openai/gpt-5.4-mini up

Raw Data

Detail log — full responses and judge verdicts per prompt
JSON — structured data for programmatic access
Markdown — plain text report
responses.json — raw model outputs
judgments.json — raw judge verdicts
run.log — debug log
Agent Skill — how to read and interpret this data
Methodology — how evaluations work