Language Models Can't Tell What's Missing

AbsenceBench is a benchmark for evaluating how well language models can identify and reason about missing information in text—revealing fundamental gaps in model comprehension.

4,300+Test Examples

14Models Evaluated

3Task Categories

Read Paper View Dataset GitHub

Model Performance

Top models by average score

🥇Gemini-2.5-flash (thinking)71.2
🥈Claude-3.7-Sonnet (thinking)69.6
🥉Claude-3.7-Sonnet66.9
4Gemini-2.5-flash63.6

View full leaderboard →

About the Benchmark

The Challenge

While language models excel at processing explicit information, they struggle with a fundamental aspect of comprehension: identifying what's missing. AbsenceBench tests models' ability to detect gaps, omissions, and absent context in text.

Three Task Categories

Poetry: Find the missing lines in a recitation of a poem
Numerical Sequences: Detecting when a number in a sequence is absent
GitHub Pull Requests: Identify missing lines within a PR's diff

Why It Matters

The ability to recognize missing information is crucial for real-world applications like code review, fact-checking, instruction following, and critical reasoning. Current models' limitations in this area reveal important gaps in their cognitive capabilities.

Example Tasks

Full Model Leaderboard

Rank	Model	Poetry	Numerical Sequences	GitHub PRs	Average