📊 Sectioned Summary Table Visualization¶
This notebook demonstrates how to use the sectioned_summary_table_viz.py tool. This visualization is designed to show a breakdown of multiple metrics across different categories (sections) for a specific group of players, using percentile ranks to compare them against the rest of the dataset.
📋 Step 1: Libraries and Path Setup¶
In [2]:
Copied!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sys
import os
# Add the src directory to the path so we can import the visualization module
sys.path.append(os.path.abspath('../../../src'))
from visualization.sectioned_summary_table_viz import ranking_plot
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sys
import os
# Add the src directory to the path so we can import the visualization module
sys.path.append(os.path.abspath('../../../src'))
from visualization.sectioned_summary_table_viz import ranking_plot
📥 Step 2: Load Data¶
We will load the AUS A-League physical aggregates for the 2024/2025 season.
In [3]:
Copied!
data_path = '../../../data/aggregates/aus1league_physicalaggregates_20242025.csv'
df = pd.read_csv(data_path)
print(f"Loaded {len(df)} rows")
df.head()
data_path = '../../../data/aggregates/aus1league_physicalaggregates_20242025.csv'
df = pd.read_csv(data_path)
print(f"Loaded {len(df)} rows")
df.head()
Loaded 406 rows
Out[3]:
| player_name | player_short_name | player_id | player_birthdate | team_name | team_id | competition_name | competition_id | season_name | season_id | ... | sprint_distance_full_otip | sprint_count_full_otip | hi_distance_full_otip | hi_count_full_otip | medaccel_count_full_otip | highaccel_count_full_otip | meddecel_count_full_otip | highdecel_count_full_otip | explacceltohsr_count_full_otip | explacceltosprint_count_full_otip | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Adam Taggart | A. Taggart | 211 | 1993-06-02 | Perth Glory Football Club | 871 | AUS - A-League | 61 | 2024/2025 | 95 | ... | 48.46 | 3.00 | 297.83 | 27.21 | 42.38 | 2.67 | 38.25 | 6.96 | 1.92 | 0.08 |
| 1 | Adama Traoré | A. Traoré | 218 | 1990-02-03 | Melbourne Victory Football Club | 868 | AUS - A-League | 61 | 2024/2025 | 95 | ... | 69.67 | 4.00 | 315.33 | 25.00 | 37.83 | 2.00 | 25.67 | 4.50 | 1.17 | 0.50 |
| 2 | Dino Arslanagić | D. Arslanagić | 2759 | 1993-04-24 | Macarthur FC | 1804 | AUS - A-League | 61 | 2024/2025 | 95 | ... | 29.12 | 1.25 | 223.25 | 16.88 | 45.12 | 1.75 | 30.75 | 3.50 | 1.25 | 0.00 |
| 3 | Douglas Costa de Souza | Douglas Costa | 2858 | 1990-09-14 | Sydney Football Club | 869 | AUS - A-League | 61 | 2024/2025 | 95 | ... | 28.00 | 2.00 | 227.25 | 17.50 | 28.75 | 2.00 | 25.00 | 3.00 | 0.75 | 0.25 |
| 4 | Douglas Costa de Souza | Douglas Costa | 2858 | 1990-09-14 | Sydney Football Club | 869 | AUS - A-League | 61 | 2024/2025 | 95 | ... | 1.00 | 0.00 | 170.00 | 16.00 | 26.00 | 0.00 | 21.00 | 2.00 | 0.00 | 0.00 |
5 rows × 65 columns
📂 Step 3: Normalization and Filtering¶
Matching the approach in the Z-score tutorial, we perform the following:
- Filter for players with at least 5 matches.
- Normalize cumulative metrics to Per 90 (P90) to allow fair comparison.
In [19]:
Copied!
# Initial filtering
df_filtered = df[(df['count_match'] >= 5) & (df['minutes_full_all'] > 60)].copy()
# Normalize key metrics to P90
metrics_to_normalize = {
'hi_count_full_all': 'hi_count_p90',
'sprint_distance_full_all': 'sprint_distance_p90',
'total_distance_full_all': 'total_dist_p90',
'running_distance_full_all': 'running_dist_p90',
'highaccel_count_full_all': 'highaccel_p90',
'explacceltosprint_count_full_all': 'explacceltosprint_p90'
}
for raw_col, p90_col in metrics_to_normalize.items():
df_filtered[p90_col] = (df_filtered[raw_col] / df_filtered['minutes_full_all']) * 90
print(f"Data filtered to {len(df_filtered)} records and normalized.")
# Initial filtering
df_filtered = df[(df['count_match'] >= 5) & (df['minutes_full_all'] > 60)].copy()
# Normalize key metrics to P90
metrics_to_normalize = {
'hi_count_full_all': 'hi_count_p90',
'sprint_distance_full_all': 'sprint_distance_p90',
'total_distance_full_all': 'total_dist_p90',
'running_distance_full_all': 'running_dist_p90',
'highaccel_count_full_all': 'highaccel_p90',
'explacceltosprint_count_full_all': 'explacceltosprint_p90'
}
for raw_col, p90_col in metrics_to_normalize.items():
df_filtered[p90_col] = (df_filtered[raw_col] / df_filtered['minutes_full_all']) * 90
print(f"Data filtered to {len(df_filtered)} records and normalized.")
Data filtered to 210 records and normalized.
📊 Step 4: Visualization Setup¶
We need to define:
- Questions: A dictionary mapping section labels to lists of metrics.
- Highlight Group: The list of players we want to display in the table.
In [24]:
Copied!
# Sectioned metrics definition
questions = {
'Intensity': ['hi_count_p90', 'sprint_distance_p90'],
'Volume': ['total_dist_p90', 'total_metersperminute_full_tip','total_metersperminute_full_otip'],
'Explosivity': ['highaccel_p90', 'explacceltosprint_p90']
}
# Metric labels for cleaner output
metric_labels = {
'hi_count_p90': 'HI Count',
'sprint_distance_p90': 'Sprint Distance',
'total_dist_p90': 'Total Distance',
'total_metersperminute_full_tip': 'Meters Per Minute TIP',
'total_metersperminute_full_otip': 'Meters Per Minute OTIP',
'highaccel_p90': 'High Accels',
'explacceltosprint_p90': 'Explosive Accels'
}
# Select a group of players for comparison (e.g., players from Sydney FC)
highlight_players = df_filtered[df_filtered['team_name'] == 'Sydney Football Club']['player_name'].unique()[:6]
# Sectioned metrics definition
questions = {
'Intensity': ['hi_count_p90', 'sprint_distance_p90'],
'Volume': ['total_dist_p90', 'total_metersperminute_full_tip','total_metersperminute_full_otip'],
'Explosivity': ['highaccel_p90', 'explacceltosprint_p90']
}
# Metric labels for cleaner output
metric_labels = {
'hi_count_p90': 'HI Count',
'sprint_distance_p90': 'Sprint Distance',
'total_dist_p90': 'Total Distance',
'total_metersperminute_full_tip': 'Meters Per Minute TIP',
'total_metersperminute_full_otip': 'Meters Per Minute OTIP',
'highaccel_p90': 'High Accels',
'explacceltosprint_p90': 'Explosive Accels'
}
# Select a group of players for comparison (e.g., players from Sydney FC)
highlight_players = df_filtered[df_filtered['team_name'] == 'Sydney Football Club']['player_name'].unique()[:6]
🎨 Step 5: Render the Sectioned Summary Table¶
In [25]:
Copied!
fig, ax = ranking_plot(
df=df_filtered,
questions=questions,
highlight_group=highlight_players,
metric_labels=metric_labels,
figsize=(14, 8)
)
fig, ax = ranking_plot(
df=df_filtered,
questions=questions,
highlight_group=highlight_players,
metric_labels=metric_labels,
figsize=(14, 8)
)
Alternative Visualization: Bubble Mode¶
We can also use user_circles=True to render the ranks as expanding bubbles instead of bars.
In [26]:
Copied!
fig, ax = ranking_plot(
df=df_filtered,
questions=questions,
highlight_group=highlight_players,
metric_labels=metric_labels,
user_circles=True,
figsize=(14, 8)
)
fig, ax = ranking_plot(
df=df_filtered,
questions=questions,
highlight_group=highlight_players,
metric_labels=metric_labels,
user_circles=True,
figsize=(14, 8)
)
In [ ]:
Copied!