LLM Weather Report

Tracking raw LLM reasoning drift — pure endpoint, no agents

gpt-5.4 lost logic-1; gpt-5.4 lost math-1; gpt-5.4 lost spatial-1; gpt-5.4 lost causality-1; gpt-5.4 lost code-1; gpt-5.4 lost ambiguity-1; gpt-5.4 lost common-sense-1; gpt-5.4-mini lost math-1; gpt-5.4-mini lost causality-1; gpt-5.4-mini lost code-1; gpt-5.4-mini lost ambiguity-1; gpt-5.4-mini lost common-sense-1. gpt-5.4-mini recovering. gemini-2.5-flash scores rising.

May 23, 2026 — 12:27 PM CT

Drift Alerts

Provider Status

Scorecard

Modelambiguity-1causality-1code-1common-sense-1logic-1math-1spatial-1
anthropic/claude-haiku-4-5✓ (4)✓ (4.5)✓ (4.5)✓ (4)✓ (5)✓ (5)✓ (5)
anthropic/claude-opus-4-6✓ (5)✓ (4.75)✓ (4.5)✓ (4)✓ (5)✓ (5)✓ (5)
anthropic/claude-sonnet-4-6✓ (4.75)✓ (4.75)✓ (4.75)✓ (4.25)✓ (5)✓ (5)✓ (5)
gemini/gemini-2.5-flash✓ (4.25)✓ (5)was 3.83✓ (5)✓ (5)✓ (5)✓ (5)✓ (5)
gemini/gemini-2.5-pro✓ (4.5)✓ (4.75)✓ (4.75)✓ (5)✓ (5)✓ (5)✓ (5)
ollama/llama3
openai/gpt-5.4
openai/gpt-5.4-mini✓ (5)

Model Status

Raw Data