Optimal Tennis Match

The Paradox of Federer's Success

Roger Federer, one of the greatest tennis players in history, once mentioned in a speech that he won only about 54% of the points he played. This statement may seem paradoxical: how can a player with such a small margin of superiority dominate the sport?

The answer lies in the amplification effect of match structures, where small advantages accumulate and produce larger outcomes.

Example of a 2-point tiebreaker

Let's consider a simple example. Alex is playing against Bob, and Alex has probability $p$ of beating Bob in a single point. If $p = 0.5$, Alex and Bob are equally strong; if $p > 0.5$, Alex is stronger.

Now we look at a 2-point tiebreaker (which we denote by $T_2$), meaning that the first player to win two points wins the match. We want to compute the probability that Alex wins $T_2$.

For Alex to win, one of the following must happen:

  • Alex wins the first two points: probability $p^2$.
  • Alex wins the first point, loses the second, but wins the third: probability $p(1-p)p$.
  • Alex loses the first point, but wins the second and third: probability $(1-p)p^2$.

Summing these possibilities, we obtain

$$\mathbf{P}(T_2; p) = p^2 + p(1-p)p + (1-p)p^2 = 3p^2 - 2p^3.$$

Here $\mathbf{P}(T_2; p)$ denotes the probability that Alex wins $T_2$. Plugging Federer's value $p = 0.54$ into the formula gives $\mathbf{P}(T_2; 0.54) = 0.559872$.

In such a short match the amplification is only about $2\%$, but longer matches can magnify even a small advantage much more.

Complex Game Structures

From here it is natural to go beyond $T_2$ and look at longer tiebreakers or even at the real structure of a tennis match, with its somewhat complicated system of games and sets.

Unfortunately, the mathematics quickly becomes complex. For this reason I built a Probability Calculator that allows us to compute $\mathbf{P}(T; p)$ for any match structure $T$.

Figure 1. Probability $\mathbf{P}(M; p)$ for the official best-of-three-sets tennis match on the left. Here we also highlight Federer's value. On the right we plot $\mathbf{P}(T; p)$ for N-point tiebreakers.

On the plot on the right (on the bottom if you are on a phone), we see $\mathbf{P}(T; p)$ for tiebreakers of different lengths. If we only play 1 point, then it is just $\mathbf{P}(T; p) = p$, but as the length of the tiebreaker increases (play with the slider), then the curve goes up, for $p>1/2$ and down for $p < 1/2$. This is exactly the amplification effect. Of course as matches becomes longer, this amplification becomes more stark.

On the plot on the left we look at the full tennis match, best-of-three sets. Here, the amplification effect is enormous. If we plug in Federer's probability of 54%, the predicted probability of winning a match becomes approximately 86%. A net $+32\%$!

A quick search on Wikipedia shows that Federer won about 82% of the matches in his career. This is quite encouraging: we obtained the correct order of magnitude.

Of course the numbers do not match exactly. The reason is that tennis points are not independent and identically distributed. Federer has a much higher probability of winning against weaker opponents, and even within a match players may perform differently in crucial moments.

Are we playing tennis right?

While the amplification effect of a tennis match is quite remarkable, this can largely be explained by the fact that a tennis match lasts much longer. For this reason we now look into tiebreakers that lasts roughly as long as a tennis match? Would it amplify skill differences better or worse?

However, comparing match structures is not straightforward. A tennis match can technically last forever because of advantages, and on top, how do we directly compare the amplification effect of two different game structures? To go beyond a hand-weavy intuition, we need to introduce two key quantities: fairness and expected match length.

Fairness

Fairness is our way of quantifying the amplification effect of a match structure. While all tiebreakers amplify the advantage of the stronger player, $T_7$ is clearly more “fair” than $T_2$, and the best-of-three tennis match is more fair than $T_7$.

We define fairness as the slope of the tangent to $\mathbf{P}(M; p)$ at $p = 0.5$ (the dashed line in the right plot of Figure 1). A steeper slope means that a small edge for Alex turns into a much larger advantage. We denote this slope by $\mathbf{F}(M)$.

Formally, $\mathbf{F}(M)$ is the derivative of $\mathbf{P}(M; p)$ evaluated at $p=0.5$. Intuitively, suppose $p = 0.5 + \epsilon$ where $\epsilon$ is a small margin. By Taylor expansion,

$$\mathbf{P}(M; 0.5 + \epsilon) \simeq 0.5 + \mathbf{F}(M)\epsilon.$$

Thus the margin has been amplified by a factor $\mathbf{F}(M)$.

Match length

While making a match longer amplifies the advantage of the stronger player, there is an obvious trade-off: matches cannot last forever. Therefore we also evaluate a match structure $M$ based on how long it takes to play.

Since many tennis formats can theoretically last forever (for example with advantages), we measure the expected number of points played, denoted by $\mathbf{E}(M)$.

The Tradeoff

We now want to understand the relationship between fairness and match length.

In Figure 2 we plot $\mathbf{E}(M)$ against $\mathbf{F}(M)$. Each dot corresponds to a tiebreaker of a different length: as tiebreakers become longer (higher on the y-axis), they also become fairer (further to the right).

The natural question is how the official tennis format compares: best-of-three sets with first-to-six games and tiebreakers.

This format is shown as the star in Figure 2. Surprisingly, it falls below the tiebreaker curve. In other words, if we replaced the match by a single tiebreaker of the same expected length (about 87 points), the result would actually be less fair.

Figure 2. Trade-off between fairness $\mathbf{F}(M)$ and expected match length $\mathbf{E}(M)$. The dots correspond to tiebreakers of different lengths, the rhombuses represent the optimal matches $O_N$, and the star represents the official best-of-three-sets tennis match. The Unreachable Zone is the region that, according to our theorem, no match structure can achieve.

Before seeing this result I assumed that tiebreakers would be more efficient than the official format: perhaps less exciting to watch, but mathematically more efficient. But the opposite turns out to be true, this motivated me to investigate what makes tennis matches effective and whether an even more efficient structure exists.

The Optimal Tennis Match

After some experimentation I devised a match structure that is significantly more efficient.

The match starts at score $0$. Whenever the first player wins a point the score increases by $1$, and when the second player wins a point it decreases by $1$. When the score reaches $N$, the first player wins; if it reaches $-N$, the second player wins. We call this match structure $O_N$.

Using the calculator we observe the elegant formulas

$$ \begin{align*} \mathbf{F}(O_N) &= N, \\ \mathbf{E}(O_N) &= N^2. \end{align*} $$

In Figure 2 these matches appear as black rhombuses. We see that they vastly outperform both tiebreakers and the official tennis format.

The Theorem

You may have guessed that the name $O_N$ stands for “optimal”. In fact, I proved that no match structure can do better.

For any match structure $M$,

$$\mathbf{E}(M) \geq \mathbf{F}(M)^2.$$

The proof is available here.

Conclusion

Perhaps we should at least try my proposed optimal tennis match! I think it would be great fun (trust me, I tested it with a friend).