The problem with grades
Every boulder problem carries a grade — a single number meant to capture how hard the climb is. In practice, grades are noisy. The first ascensionist proposes one, later climbers agree or disagree, and the consensus drifts over the years. Two problems sharing the same grade can feel wildly different, and a “soft” 7A may be easier than a “hard” 6C+.
The grade is a label. What we actually want is the underlying difficulty.
Climbers as a rating system
Each ascent log is a small piece of evidence: this climber tried this boulder and either sent it or didn't. If we treat climbers and boulders as players in a head-to-head game, we can fit an Elo-style model where each boulder and each climber has a latent strength, and the probability of a send depends on their difference.
Given enough overlapping ascents, the model can untangle “this climber is strong” from “this boulder is soft” — even though neither is directly observed.
The data
The model is fit on more than a million logged ascents. For each boulder I report a posterior Elo with a credible interval, and compare it to the median Elo of all boulders sharing the same grade. The residual — and its z-score — tells you how soft or hard the boulder is relative to its label.
What's next
A proper write-up of the model, the priors, and the inference is coming. For now, the data speaks for itself: search for a boulder you've tried and see whether the numbers agree with your fingers.