This project analyzes a real-world, ongoing dataset of padel match results to move beyond simple win/loss records and develop a robust player performance evaluation. The analysis is structured as a formal case study, demonstrating a complete data science workflow from data wrangling to statistical inference and interpretation.
The fundamental assumption of this analysis is that match, set, game, and tiebreak outcomes can be modeled as a Binomial process. Each event is treated as an independent Bernoulli trial where a player either wins or does not. This allows us to estimate a player’s underlying, unobserved skill parameter (\(p\)), representing their true probability of winning any given event.
To provide a comprehensive picture, this report employs two major statistical paradigms: 1. Frequentist Analysis: To provide objective, long-run probabilities and answer “yes/no” questions about statistical significance. 2. Bayesian Analysis: To quantify our certainty and provide a more intuitive understanding of player skill, especially given the limited data.
The first step is to estimate each player’s skill from the observed data. We calculate the Maximum Likelihood Estimate (MLE) of the true win probability (\(p\)) for each player at all four levels. This is the observed proportion of wins, and the results are summarized in the leaderboard below, ranked by Game Win Percentage.
Metric Definitions: * All “Played” counts include every valid match, including those that ended in a draw. *
Match_Win_Pct
is calculated using a points system where a win is 1 point and a draw is 0.5 points. * All other win percentages are the simple proportion of wins to total events played.
Player | Matches_Played | Match_Win_Pct | Sets_Played | Set_Win_Pct | Games_Played | Game_Win_Pct | Tiebreaks_Played | Tiebreak_Win_Pct |
---|---|---|---|---|---|---|---|---|
Antun veli | 1 | 100.0 | 2 | 100.0 | 20 | 60.0 | 0 | NA |
Kim | 19 | 65.8 | 43 | 62.8 | 458 | 55.9 | 5 | 100.0 |
Anttu | 7 | 78.6 | 17 | 70.6 | 176 | 54.5 | 3 | 33.3 |
Tomi | 5 | 50.0 | 13 | 53.8 | 127 | 53.5 | 2 | 0.0 |
Jone | 19 | 42.1 | 41 | 41.5 | 444 | 52.5 | 5 | 80.0 |
Jussi | 2 | 100.0 | 6 | 66.7 | 58 | 51.7 | 1 | 0.0 |
Hastis | 3 | 50.0 | 2 | 50.0 | 28 | 50.0 | 0 | NA |
Jon | 15 | 50.0 | 38 | 50.0 | 394 | 45.4 | 4 | 50.0 |
Antti | 17 | 20.6 | 38 | 28.9 | 407 | 41.3 | 4 | 0.0 |
Next, we use hypothesis testing to determine if a player’s performance is statistically significant. For this test of win proportion, we focus on decisive outcomes (wins and losses).
Player | Wins | Decisive Played (n) | P-Value | 95% CI Lower | 95% CI Upper | Significant |
---|---|---|---|---|---|---|
Kim | 256 | 458 | 0.006 | 0.521 | 1 | Yes |
Anttu | 96 | 176 | 0.114 | 0.483 | 1 | No |
Jone | 233 | 444 | 0.148 | 0.486 | 1 | No |
Antun veli | 12 | 20 | 0.186 | 0.419 | 1 | No |
Tomi | 68 | 127 | 0.212 | 0.463 | 1 | No |
Jussi | 30 | 58 | 0.396 | 0.411 | 1 | No |
Hastis | 14 | 28 | 0.500 | 0.352 | 1 | No |
Jon | 179 | 394 | 0.965 | 0.414 | 1 | No |
Antti | 168 | 407 | 1.000 | 0.373 | 1 | No |
Player | Wins | Decisive Played (n) | P-Value | 95% CI Lower | 95% CI Upper | Significant |
---|---|---|---|---|---|---|
Anttu | 11 | 15 | 0.035 | 0.521 | 1 | Yes |
Kim | 27 | 43 | 0.047 | 0.502 | 1 | Yes |
Antun veli | 2 | 2 | 0.079 | 0.425 | 1 | No |
Jussi | 4 | 6 | 0.207 | 0.347 | 1 | No |
Tomi | 6 | 11 | 0.382 | 0.315 | 1 | No |
Jon | 18 | 36 | 0.500 | 0.368 | 1 | No |
Hastis | 1 | 2 | 0.500 | 0.121 | 1 | No |
Jone | 16 | 39 | 0.869 | 0.291 | 1 | No |
Antti | 11 | 38 | 0.995 | 0.186 | 1 | No |
Player | Wins | Decisive Played (n) | P-Value | 95% CI Lower | 95% CI Upper | Significant |
---|---|---|---|---|---|---|
Anttu | 5 | 6 | 0.051 | 0.498 | 1 | No |
Kim | 12 | 18 | 0.079 | 0.473 | 1 | No |
Jussi | 2 | 2 | 0.079 | 0.425 | 1 | No |
Antun veli | 1 | 1 | 0.159 | 0.270 | 1 | No |
Jon | 7 | 14 | 0.500 | 0.299 | 1 | No |
Hastis | 1 | 2 | 0.500 | 0.121 | 1 | No |
Tomi | 2 | 4 | 0.500 | 0.182 | 1 | No |
Jone | 7 | 17 | 0.767 | 0.241 | 1 | No |
Antti | 3 | 16 | 0.994 | 0.078 | 1 | No |
Player | Wins | Decisive Played (n) | P-Value | 95% CI Lower | 95% CI Upper | Significant |
---|---|---|---|---|---|---|
Kim | 5 | 5 | 0.013 | 0.649 | 1 | Yes |
Jone | 4 | 5 | 0.090 | 0.435 | 1 | No |
Jon | 2 | 4 | 0.500 | 0.182 | 1 | No |
Anttu | 1 | 3 | 0.718 | 0.078 | 1 | No |
Jussi | 0 | 1 | 0.841 | 0.000 | 1 | No |
Tomi | 0 | 2 | 0.921 | 0.000 | 1 | No |
Antti | 0 | 4 | 0.977 | 0.000 | 1 | No |
Hastis | 0 | 0 | NA | NA | NA | N/A |
Antun veli | 0 | 0 | NA | NA | NA | N/A |
A key observation from these tables is that very few results are statistically significant—only a handful of tests showed a p-value below our 0.05 threshold. This is not surprising and highlights a central theme of this analysis: detecting a small winning edge requires a substantial amount of evidence. As our subsequent power analysis will confirm, many of our tests were not sensitive enough to confidently distinguish a real, small skill difference from random chance.
A hypothesis test can fail to find a significant result simply because it lacks statistical power. This analysis evaluates the sensitivity of our tests. We define a “meaningfully skilled player” as someone with a true win rate of 55%. The table below shows the probability (power) of our test correctly identifying such a player, given our current sample sizes.
Player | N (Match) | Power | N (Set) | Power | N (Game) | Power | N (Tiebreak) | Power |
---|---|---|---|---|---|---|---|---|
Kim | 18 | 0.11 | 43 | 0.16 | 458 | 0.69 | 5 | 0.08 |
Jone | 17 | 0.11 | 39 | 0.15 | 444 | 0.68 | 5 | 0.08 |
Antti | 16 | 0.11 | 38 | 0.15 | 407 | 0.65 | 4 | 0.07 |
Jon | 14 | 0.10 | 36 | 0.15 | 394 | 0.63 | 4 | 0.07 |
Anttu | 6 | 0.08 | 15 | 0.10 | 176 | 0.38 | 3 | 0.07 |
Tomi | 4 | 0.07 | 11 | 0.09 | 127 | 0.30 | 2 | 0.07 |
Jussi | 2 | 0.07 | 6 | 0.08 | 58 | 0.19 | 1 | 0.06 |
Hastis | 2 | 0.07 | 2 | 0.07 | 28 | 0.13 | 0 | NA |
Antun veli | 1 | 0.06 | 2 | 0.07 | 20 | 0.12 | 0 | NA |
The results confirm that our analysis is most reliable at the game-level due to its higher power.
The following plots visualize the relationship between statistical power, effect size, and sample size.
The curve above shows that with our current best sample size (n=450), our test becomes highly sensitive (power > 80%) when a player’s true win rate approaches 58%.
This second curve simulates a future scenario with more data (n=600), showing that the test would become powerful enough to reliably detect even smaller winning edges (~56%).
As a complementary approach, we use Bayesian inference. This method
is ideal for quantifying our certainty given the limited data. We use a
Beta-Binomial model with a weakly informative prior
(Beta(2, 2)
), which assumes a player is likely average
before we see their results.
The table below shows the direct probability that each player’s true
skill is greater than 50% (P(p > 0.5)
). This provides a
more intuitive measure of evidence than a p-value.
Player | Match | Set | Game | Tiebreak |
---|---|---|---|---|
Kim | 0.857 | 0.948 | 0.994 | 0.965 |
Anttu | 0.828 | 0.942 | 0.884 | 0.344 |
Jone | 0.143 | 0.146 | 0.851 | 0.855 |
Antun veli | 0.688 | 0.812 | 0.798 | 0.500 |
Tomi | 0.363 | 0.598 | 0.785 | 0.187 |
Jussi | 0.812 | 0.746 | 0.601 | 0.313 |
Hastis | 0.344 | 0.500 | 0.500 | 0.500 |
Jon | 0.407 | 0.500 | 0.035 | 0.500 |
Antti | 0.006 | 0.006 | 0.000 | 0.063 |
Finally, we visualize the full posterior distributions. Wider curves indicate more uncertainty, while narrower curves indicate more certainty. These plots provide the most complete picture of our findings.
This report serves as a mid-term summary of the project, with the central theme being the critical role of sample size in statistical certainty. This explains our key frequentist finding: despite several players having winning records, very few of these results were found to be statistically significant.
The dual-analysis approach provided a comprehensive picture. While the frequentist tests gave us objective “yes/no” answers on significance, the Bayesian analysis offered a more nuanced view of uncertainty. The Bayesian posterior plots provided the clearest visualization of our conclusions:
This pattern holds across all levels of analysis, confirming that the most reliable insights are derived from the game-level data where our sample size is largest.
A Note on the Test Statistic
The frequentist analysis used a one-proportion z-test. The z-statistic is an intuitive measure of evidence: it counts how many standard errors our observed result (e.g., a 56% win rate) is away from the null hypothesis (a 50% win rate). A large z-score indicates that the result is far enough from 50% that it is unlikely to be due to random chance, leading to a significant p-value.
This report constitutes the complete exploratory and inferential analysis phase of the project. The final phase will focus on predictive modeling and productization by developing an R Shiny web application.
The planned features for the application include: