Introduction: The Analytical Framework

Objective

This project analyzes a real-world, ongoing dataset of padel match results to move beyond simple win/loss records and develop a robust player performance evaluation. The analysis is structured as a formal case study, demonstrating a complete data science workflow from data wrangling to statistical inference and interpretation.

Core Assumptions

The fundamental assumption of this analysis is that match, set, game, and tiebreak outcomes can be modeled as a Binomial process. Each event is treated as an independent Bernoulli trial where a player either wins or does not. This allows us to estimate a player’s underlying, unobserved skill parameter (\(p\)), representing their true probability of winning any given event.

Dual-Approach Analysis

To provide a comprehensive picture, this report employs two major statistical paradigms: 1. Frequentist Analysis: To provide objective, long-run probabilities and answer “yes/no” questions about statistical significance. 2. Bayesian Analysis: To quantify our certainty and provide a more intuitive understanding of player skill, especially given the limited data.


1. Descriptive Statistics: Player Leaderboard

The first step is to estimate each player’s skill from the observed data. We calculate the Maximum Likelihood Estimate (MLE) of the true win probability (\(p\)) for each player at all four levels. This is the observed proportion of wins, and the results are summarized in the leaderboard below, ranked by Game Win Percentage.

Metric Definitions: * All “Played” counts include every valid match, including those that ended in a draw. * Match_Win_Pct is calculated using a points system where a win is 1 point and a draw is 0.5 points. * All other win percentages are the simple proportion of wins to total events played.

Overall Player Performance Summary (MLEs)
Player Matches_Played Match_Win_Pct Sets_Played Set_Win_Pct Games_Played Game_Win_Pct Tiebreaks_Played Tiebreak_Win_Pct
Antun veli 1 100.0 2 100.0 20 60.0 0 NA
Kim 19 65.8 43 62.8 458 55.9 5 100.0
Anttu 7 78.6 17 70.6 176 54.5 3 33.3
Tomi 5 50.0 13 53.8 127 53.5 2 0.0
Jone 19 42.1 41 41.5 444 52.5 5 80.0
Jussi 2 100.0 6 66.7 58 51.7 1 0.0
Hastis 3 50.0 2 50.0 28 50.0 0 NA
Jon 15 50.0 38 50.0 394 45.4 4 50.0
Antti 17 20.6 38 28.9 407 41.3 4 0.0

2. Inferential Analysis: Frequentist Approach

Next, we use hypothesis testing to determine if a player’s performance is statistically significant. For this test of win proportion, we focus on decisive outcomes (wins and losses).

  • Null Hypothesis (H0): The player’s true win probability is 50% or less (p <= 0.5).
  • Alternative Hypothesis (Ha): The player’s true win probability is greater than 50% (p > 0.5).
  • Significance Level: We use a standard \(\alpha = 0.05\).

Game-Level Analysis

Game Level Hypothesis Test Results (H0: p <= 0.5)
Player Wins Decisive Played (n) P-Value 95% CI Lower 95% CI Upper Significant
Kim 256 458 0.006 0.521 1 Yes
Anttu 96 176 0.114 0.483 1 No
Jone 233 444 0.148 0.486 1 No
Antun veli 12 20 0.186 0.419 1 No
Tomi 68 127 0.212 0.463 1 No
Jussi 30 58 0.396 0.411 1 No
Hastis 14 28 0.500 0.352 1 No
Jon 179 394 0.965 0.414 1 No
Antti 168 407 1.000 0.373 1 No

Set-Level Analysis

Set Level Hypothesis Test Results (H0: p <= 0.5)
Player Wins Decisive Played (n) P-Value 95% CI Lower 95% CI Upper Significant
Anttu 11 15 0.035 0.521 1 Yes
Kim 27 43 0.047 0.502 1 Yes
Antun veli 2 2 0.079 0.425 1 No
Jussi 4 6 0.207 0.347 1 No
Tomi 6 11 0.382 0.315 1 No
Jon 18 36 0.500 0.368 1 No
Hastis 1 2 0.500 0.121 1 No
Jone 16 39 0.869 0.291 1 No
Antti 11 38 0.995 0.186 1 No

Match-Level Analysis

Match Level Hypothesis Test Results (H0: p <= 0.5)
Player Wins Decisive Played (n) P-Value 95% CI Lower 95% CI Upper Significant
Anttu 5 6 0.051 0.498 1 No
Kim 12 18 0.079 0.473 1 No
Jussi 2 2 0.079 0.425 1 No
Antun veli 1 1 0.159 0.270 1 No
Jon 7 14 0.500 0.299 1 No
Hastis 1 2 0.500 0.121 1 No
Tomi 2 4 0.500 0.182 1 No
Jone 7 17 0.767 0.241 1 No
Antti 3 16 0.994 0.078 1 No

Tiebreak-Level Analysis

Tiebreak Level Hypothesis Test Results (H0: p <= 0.5)
Player Wins Decisive Played (n) P-Value 95% CI Lower 95% CI Upper Significant
Kim 5 5 0.013 0.649 1 Yes
Jone 4 5 0.090 0.435 1 No
Jon 2 4 0.500 0.182 1 No
Anttu 1 3 0.718 0.078 1 No
Jussi 0 1 0.841 0.000 1 No
Tomi 0 2 0.921 0.000 1 No
Antti 0 4 0.977 0.000 1 No
Hastis 0 0 NA NA NA N/A
Antun veli 0 0 NA NA NA N/A

A key observation from these tables is that very few results are statistically significant—only a handful of tests showed a p-value below our 0.05 threshold. This is not surprising and highlights a central theme of this analysis: detecting a small winning edge requires a substantial amount of evidence. As our subsequent power analysis will confirm, many of our tests were not sensitive enough to confidently distinguish a real, small skill difference from random chance.


3. Post-Hoc Power Analysis

A hypothesis test can fail to find a significant result simply because it lacks statistical power. This analysis evaluates the sensitivity of our tests. We define a “meaningfully skilled player” as someone with a true win rate of 55%. The table below shows the probability (power) of our test correctly identifying such a player, given our current sample sizes.

Statistical Power to Detect a 55% Win Rate (against H0: p <= 0.5)
Player N (Match) Power N (Set) Power N (Game) Power N (Tiebreak) Power
Kim 18 0.11 43 0.16 458 0.69 5 0.08
Jone 17 0.11 39 0.15 444 0.68 5 0.08
Antti 16 0.11 38 0.15 407 0.65 4 0.07
Jon 14 0.10 36 0.15 394 0.63 4 0.07
Anttu 6 0.08 15 0.10 176 0.38 3 0.07
Tomi 4 0.07 11 0.09 127 0.30 2 0.07
Jussi 2 0.07 6 0.08 58 0.19 1 0.06
Hastis 2 0.07 2 0.07 28 0.13 0 NA
Antun veli 1 0.06 2 0.07 20 0.12 0 NA

The results confirm that our analysis is most reliable at the game-level due to its higher power.


4. Power Curve Visualization

The following plots visualize the relationship between statistical power, effect size, and sample size.

The curve above shows that with our current best sample size (n=450), our test becomes highly sensitive (power > 80%) when a player’s true win rate approaches 58%.

This second curve simulates a future scenario with more data (n=600), showing that the test would become powerful enough to reliably detect even smaller winning edges (~56%).


5. Bayesian Analysis: Quantifying Uncertainty

As a complementary approach, we use Bayesian inference. This method is ideal for quantifying our certainty given the limited data. We use a Beta-Binomial model with a weakly informative prior (Beta(2, 2)), which assumes a player is likely average before we see their results.

Bayesian Probability Summary

The table below shows the direct probability that each player’s true skill is greater than 50% (P(p > 0.5)). This provides a more intuitive measure of evidence than a p-value.

Bayesian Probability of Player Skill Being Above Average (p > 50%)
Player Match Set Game Tiebreak
Kim 0.857 0.948 0.994 0.965
Anttu 0.828 0.942 0.884 0.344
Jone 0.143 0.146 0.851 0.855
Antun veli 0.688 0.812 0.798 0.500
Tomi 0.363 0.598 0.785 0.187
Jussi 0.812 0.746 0.601 0.313
Hastis 0.344 0.500 0.500 0.500
Jon 0.407 0.500 0.035 0.500
Antti 0.006 0.006 0.000 0.063

Player Skill Distributions

Finally, we visualize the full posterior distributions. Wider curves indicate more uncertainty, while narrower curves indicate more certainty. These plots provide the most complete picture of our findings.

Game-Level Distributions

Set-Level Distributions

Match-Level Distributions

Tiebreak-Level Distributions

6. Conclusion of Statistical Analysis

This report serves as a mid-term summary of the project, with the central theme being the critical role of sample size in statistical certainty. This explains our key frequentist finding: despite several players having winning records, very few of these results were found to be statistically significant.

The dual-analysis approach provided a comprehensive picture. While the frequentist tests gave us objective “yes/no” answers on significance, the Bayesian analysis offered a more nuanced view of uncertainty. The Bayesian posterior plots provided the clearest visualization of our conclusions:

  • A player like Hastis, with very little data, has a wide, flat skill distribution that is almost identical to our initial prior belief—we have learned very little about his true skill.
  • In contrast, a player like Kim, with over 450 games, has a much narrower, “peaky” distribution, representing our high degree of certainty that he is a winning player.

This pattern holds across all levels of analysis, confirming that the most reliable insights are derived from the game-level data where our sample size is largest.

A Note on the Test Statistic

The frequentist analysis used a one-proportion z-test. The z-statistic is an intuitive measure of evidence: it counts how many standard errors our observed result (e.g., a 56% win rate) is away from the null hypothesis (a 50% win rate). A large z-score indicates that the result is far enough from 50% that it is unlikely to be due to random chance, leading to a significant p-value.

7. Future Work: Predictive Modeling & Interactive Application

This report constitutes the complete exploratory and inferential analysis phase of the project. The final phase will focus on predictive modeling and productization by developing an R Shiny web application.

The planned features for the application include:

  • Match Outcome Prediction: A predictive model (e.g., a Bradley-Terry model) will be trained on the data to estimate player skill ratings. The app will use these ratings to generate win probabilities for any given matchup.
  • Interactive Player Dashboard: The app will also feature a dashboard where users can select a specific player to view their detailed statistical profile and personalized visualizations, such as performance over time or with different partners.