LLM Weather Report

Tracking raw LLM reasoning drift — pure endpoint, no agents

claude-haiku-4-5 lost common-sense-1. gpt-5.4-mini dropped on spatial-1; gemini-2.5-flash dropped on causality-1. gpt-5.4-mini failing spatial-1; gemini-2.5-flash failing causality-1. claude-sonnet-4-6 scores rising.

June 23, 2026 — 8:55 AM CT

Drift Alerts

Provider Status

Scorecard

Modelambiguity-1causality-1code-1common-sense-1logic-1math-1spatial-1
anthropic/claude-haiku-4-5✓ (4.5)✓ (4.75)✓ (4.5)✗ (3)was ✓ (3.33)✓ (5)✓ (5)✓ (5)
anthropic/claude-opus-4-6✓ (5)✓ (5)✓ (4.5)✓ (4.5)✓ (5)✓ (5)✓ (5)
anthropic/claude-sonnet-4-6✓ (4.75)✓ (5)✓ (4.5)✓ (4.75)was 3.67✓ (5)✓ (5)✓ (5)
gemini/gemini-2.5-flash✓ (4.5)✗ (1.75)was 3.5✓ (4.75)✓ (3.75)✓ (4.67)✓ (5)✓ (5)
gemini/gemini-2.5-pro✓ (4.75)✓ (5)✓ (4.75)✓ (4.75)✓ (5)✓ (5)✓ (5)
ollama/llama3
openai/gpt-5.4✓ (4.5)✓ (4.75)✓ (4.75)✓ (4.75)✓ (5)✓ (5)✓ (5)
openai/gpt-5.4-mini✓ (4.5)✓ (5)✓ (4.5)✓ (4.5)✓ (4.83)✓ (4.83)✗ (2.25)was 3.67

Model Status

Raw Data