Summary
Unlock Leverkusen's stellar 2023/2024 season by mastering how to calculate 'Minutes Played' using StatsBomb data, crucial for evaluating player contributions and improving team performance. Key Points:
- *Precise Player Evaluation*: Accurately quantify player contributions with 'Minutes Played' metrics derived from StatsBomb data. Identify key performers and areas needing improvement.
- *Game Context Analysis*: Understand the impact of game dynamics on player performance by considering factors like formation, opposition strength, and player position for a comprehensive evaluation.
- *Enhanced Decision-Making*: Use 'Minutes Played' data to make informed decisions regarding player selection, substitutions, and tactical adjustments to optimize team performance.
Leverage StatsBomb Data: Uncover Player Contributions with Precision
The StatsBomb dataset offers a treasure trove of event data, meticulously capturing every significant action during Leverkusen's Bundesliga matches. By delving into these detailed records, we can ascertain the exact duration each player spent on the field, thereby calculating their minutes played with precision.To achieve this, it's crucial to comprehend the hierarchical structure of the StatsBomb event data. Each recorded event is linked to a specific player, and understanding these connections allows us to accurately aggregate events for individual players. This process involves analyzing the relationships between events, players, and match timelines to ensure precise attribution of playing time. Through this methodical approach, we can reliably track and evaluate players' contributions throughout the season.
Key Points Summary
- Player Performance Project focuses on holistic improvement of individual football athletes.
- Located in Tulsa, Oklahoma, Player Performance Group specializes in casino gaming performance marketing.
- In football, attempts to score that are on target include all goals and goalkeeper saves.
- Detailed performance data and analytics are used for enhancing player analysis and recruitment.
- Player Performance Group is Native American owned and provides customized solutions for the gaming industry.
- Personal player data includes metrics from matches, training, and specific game actions like goals and touches.
The Player Performance Project is dedicated to improving football athletes holistically. Meanwhile, the Player Performance Group in Tulsa offers specialized performance marketing for casinos. By leveraging detailed data and analytics, both organizations aim to enhance performance—whether it`s on the field or in the gaming industry.
Extended Comparison:Category | Player Performance Project | Player Performance Group | Football Metrics |
---|---|---|---|
Focus Area | Holistic improvement of individual football athletes. | Casino gaming performance marketing. | Attempts to score that are on target. |
Location | Tulsa, Oklahoma | - | - |
Specialization | - | Customized solutions for the gaming industry. | - |
Ownership | - | Native American owned. | - |
Performance Data Utilized | - | - | Detailed performance data and analytics for enhancing player analysis and recruitment; includes metrics from matches, training, and specific game actions like goals and touches. |
Let's dive in, focusing exclusively on these three essential libraries.
from statsbombpy import sb import pandas as pd from datetime import timedelta
We can now access the open data for the 2023/2024 Bundesliga season through the statsbombpy library. This dataset specifically includes only Bayer Leverkusen's matches. For a more comprehensive dataset, one would need to purchase it from StatsBomb.
# list of matches in that competition mat = sb.matches(competition_id=9, season_id=281)
In the release article, both the competition ID and season ID are clearly identified. The variable 'mat' encompasses 34 entries, each representing a separate match played by Leverkusen. Upon examining the initial entry, we observe certain details about the game itself; however, there is no information provided that allows us to determine the minutes played.
match_id 3895302 match_date 2024-04-14 kick_off 17:30:00.000 competition Germany - 1. Bundesliga season 2023/2024 home_team Bayer Leverkusen away_team Werder Bremen home_score 5 away_score 0 match_status available match_status_360 available last_updated 2024-05-10T16:57:53.017895 last_updated_360 2024-05-10T17:03:59.613154 match_week 29 competition_stage Regular Season stadium BayArena referee Harm Osmers home_managers Xabier Alonso Olano away_managers Ole Werner data_version 1.1.0 shot_fidelity_version 2 xy_fidelity_version 2
To perform this calculation, we require specific event data. This data can be accessed in the following manner:
events_df = sb.competition_events( country="Germany", division="1. Bundesliga", season="2023/2024", gender="male")
The dataset, referred to as events_df, boasts an impressive 137,765 entries and features a comprehensive array of 108 columns. It's important to note that the majority of these columns are pertinent only to specific event types. Within this extensive dataset, there are 33 distinct categories of events:
Pass 39214 Ball Receipt* 38215 Carry 32369 Pressure 11419 Ball Recovery 3222 Duel 2022 Block 1434 Clearance 1248 Goal Keeper 1102 Shot 916 Dribble 913 Miscontrol 903 Dispossessed 759 Foul Committed 758 Interception 738 Foul Won 726 Dribbled Past 519 Substitution 301 50/50 201 Half Start 136 Half End 136 Tactical Shift 127 Injury Stoppage 106 Starting XI 68 Shield 49 Referee Ball-Drop 38 Bad Behaviour 34 Player Off 27 Player On 27 Error 21 Offside 11 Own Goal For 3 Own Goal Against 3
For every pass, block, shot, and other notable actions in a match, we document each instance as an individual row within our events_df dataset. However, for our present analysis, not all these events are essential. Our primary focus is identifying the participants of the game — specifically the Starting XI and substitutions made during the match. To ensure precision, we won't just assume that each game runs for a standard 90 minutes. Instead, we'll calculate the precise time each player spends on the field. For this purpose, we also need to include Half End events to accurately capture halftime durations for every game.
Finally, it's crucial to consider expulsion incidents as they also impact the total minutes played. Even though Leverkusen did not receive any red or second yellow cards throughout the season, we must still include these occurrences for the sake of methodological consistency. These expulsion events can be tracked using the foul_committed_card column, which may indicate a Yellow Card, Red Card, or Second Yellow Card.
To integrate all elements seamlessly, we filter these events accordingly:
reduced_df = events_df.loc[( (events_df.type.isin(['Starting XI','Substitution', 'Half End'])) | (events_df.foul_committed_card.isin(['Red Card','Second Yellow']))) & (events_df.team_id == 904) ]
To put it simply, we filter for key event types such as Starting XI, Substitution, and Half End. Additionally, events where the foul_committed_card column shows Red Card or Second Yellow are also considered. Our focus is on Leverkusen's matches exclusively (with their team_id being 904). The timestamp column is crucial for calculating minutes played. At present, these timestamps are strings formatted like 00:06:33.549. To perform calculations with these timestamps, we first convert them into pandas timedelta type strings.
Next, by ensuring that only relevant events like Starting XI and substitutions are taken into account along with specific fouls resulting in significant cards (Red or Second Yellow), we can refine our data set to be more precise and meaningful. This meticulous filtering helps us zero in on Leverkusen's performance metrics accurately. Moreover, converting string-based timestamps into a computable format allows us to analyze the playtime effectively.
By honing in on these critical aspects—event type selection and accurate time conversion—we streamline the process of determining players' active minutes on the field during Leverkusen’s games. Such refined data handling ensures that our analysis remains both rigorous and insightful for sports analytics purposes.
reduced_df['timestamp'] = pd.to_timedelta(reduced_df['timestamp'])
While this step isn't strictly necessary, it can streamline our work with the dataset. As highlighted earlier, a large number of the 108 columns are pertinent only to certain events. In our refined dataset, we’re left with just four different event types, resulting in many columns being entirely empty across all rows. To make our dataset more manageable, we'll eliminate any columns that contain no data at all:}
{Although this next step is optional, it significantly simplifies handling the dataset. Previously mentioned was that numerous among the 108 columns are relevant solely to specific events. Now that we’ve narrowed down our dataset to four distinct event types, several columns remain vacant for all entries. To enhance clarity and ease of use, we’ll proceed by removing all completely empty columns:
# remove all columns where ALL values are None reduced_df = reduced_df.dropna(axis=1, how='all')
This process refines the dataset, making it more suitable for detailed examination. As a result, we have streamlined the information down to just 25 columns:
reduced_df.columns ['duration', 'foul_committed_card', 'id', 'index', 'location', 'match_id', 'minute', 'off_camera', 'period', 'play_pattern', 'player', 'player_id', 'position', 'possession', 'possession_team', 'possession_team_id', 'related_events', 'second', 'substitution_outcome', 'substitution_replacement', 'tactics', 'team', 'team_id', 'timestamp', 'type']
The sequence of events is out of order. Let's correct this before we move on to the calculations:
reduced_df = reduced_df.sort_values(by=['match_id','period','timestamp'])
There are various methods to determine the number of minutes a player has been on the field. My chosen method involves several steps: first, identify the starting lineup for each match; then, record the exact durations of both halves; calculate the total playing time accordingly and allocate this time to those in the initial lineup. If a player is substituted or sent off, their remaining playtime is adjusted based on when they left the field. To start, we need to create a loop that iterates through all matches. Recall that we initially created a variable named 'mat' which holds all match data:}
{Multiple approaches exist for calculating how long players spend on the pitch. The method I opted for includes these steps: begin by identifying each game's starting eleven; next, accurately measure both halves' lengths; compute the overall duration of gameplay and assign this period to starters. Adjustments are made if players are substituted or receive red cards by reducing their remaining game time appropriately. Initially, we craft a loop to process every match sequentially. Remember that our 'mat' variable contains all relevant match information:
# Loop through all matches of the season for match_id in mat.match_id: # Initialize an empty dict to store the game time players_match_dict = {} # Reduce to events of this match match_df = reduced_df[reduced_df.match_id == match_id]
Firstly, we determine the precise minutes played in the match. This is achieved by utilizing the two Half End events recorded during the game. Both halves commence at 00:00:00, so to find the total duration of play, we need to add together the lengths of both halves. The period column helps us distinguish between the first and second halves. Therefore, by summing up these durations, we get an accurate measure of total game time.
Next, it's essential to note that each half's duration contributes equally but separately to our calculation. By identifying when each half ends using specific timestamps provided in match data, we can accurately tally up these periods for a comprehensive overview of playtime.
Finally, understanding this breakdown allows us to analyze player performance more effectively. Accurate measurement of playing time is crucial for evaluating stamina, strategy execution, and overall team dynamics throughout both halves of the game. Thus, combining these individual segments gives us a clear picture of total active minutes on the field.
# Calculate total game time period_1 = match_df[(match_df.type == 'Half End') & (match_df.period == 1)].iloc[0].timestamp period_2 = match_df[(match_df.type == 'Half End') & (match_df.period == 2)].iloc[0].timestamp total_game_time = period_1 + period_2
The Starting XI event is where the lineup for each team and match is recorded, with only one such event per team per game. By examining the initial lineup event, we find that crucial details are captured in the tactics column, formatted as JSON.
reduced_df[reduced_df.type == 'Starting XI'].iloc[0].tactics {'formation': 442, 'lineup': [{'player': {'id': 15155, 'name': 'Janis Blaswich'}, 'position': {'id': 1, 'name': 'Goalkeeper'}, 'jersey_number': 21}, {'player': {'id': 8211, 'name': 'Benjamin Henrichs'}, 'position': {'id': 2, 'name': 'Right Back'}, 'jersey_number': 39}, {'player': {'id': 30360, 'name': 'Mohamed Simakan'}, 'position': {'id': 3, 'name': 'Right Center Back'}, 'jersey_number': 2}, {'player': {'id': 8509, 'name': 'Willi Orban'}, 'position': {'id': 5, 'name': 'Left Center Back'}, 'jersey_number': 4}, {'player': {'id': 12034, 'name': 'David Raum'}, 'position': {'id': 6, 'name': 'Left Back'}, 'jersey_number': 22}, {'player': {'id': 39460, 'name': 'Nicolas Seiwald'}, 'position': {'id': 9, 'name': 'Right Defensive Midfield'}, 'jersey_number': 13}, {'player': {'id': 8769, 'name': 'Xaver Schlager'}, 'position': {'id': 11, 'name': 'Left Defensive Midfield'}, 'jersey_number': 24}, {'player': {'id': 39167, 'name': 'Xavi Simons'}, 'position': {'id': 12, 'name': 'Right Midfield'}, 'jersey_number': 20}, {'player': {'id': 5557, 'name': 'Timo Werner'}, 'position': {'id': 16, 'name': 'Left Midfield'}, 'jersey_number': 11}, {'player': {'id': 16275, 'name': 'Ikoma Loïs Openda'}, 'position': {'id': 22, 'name': 'Right Center Forward'}, 'jersey_number': 17}, {'player': {'id': 16532, 'name': 'Daniel Olmo Carvajal'}, 'position': {'id': 24, 'name': 'Left Center Forward'}, 'jersey_number': 7}]}
From the JSON data, our primary focus is on extracting the player names. To achieve this, we iterate through the 'lineup' field. For each player listed, we log their total game time into a previously established dictionary.
# Get the initial lineup lineup = match_df[match_df.type == 'Starting XI'].iloc[0].tactics # Initialize player in match dictionary for player in lineup['lineup']: players_match_dict[player['player']['name']] = total_game_time
Player Performance Evaluation: Accounting for Game Dynamics
The accurate computation of remaining game time plays a pivotal role in the analysis of player performance and team tactics. By meticulously accounting for substitutions and expulsions, it becomes possible to evaluate each player's contributions within the correct timeframe. This precise tracking ensures that assessments reflect the true impact of each player's participation.Moreover, handling player exits and entries involves intricate complexities that demand careful consideration. Through an iterative process, we can accurately monitor player involvement without double-counting game time. This method provides a comprehensive understanding of player participation and their overall impact on the match.
By integrating these crucial elements into our analysis, we gain deeper insights into both individual performances and team strategies. This approach not only enhances the accuracy of our evaluations but also enriches our understanding of the game's dynamics.
First, we will compile all the data we've gathered for the match into a pandas dataframe. Following this, we'll append a new column to include the match_id, ensuring each row is uniquely identifiable.
# Create a DataFrame for the current match players_match_df = pd.DataFrame(players_match_dict.items(), columns=['player', 'playtime']) players_match_df['match_id'] = match_id
Subsequently, we consolidate the match dataframes by integrating them into a comprehensive outer dataframe.
# Append the current match DataFrame to the main DataFrame players_df = pd.concat([players_df, players_match_df], ignore_index=True)
Before initiating the match loop, the foundational dataframe was established in this manner:}
{To set the stage for our analysis, we first constructed an outer dataframe using a specific method before entering the match loop.}
{Prior to delving into the iterative match process, a primary dataframe was generated as follows:}
{The initial step involved creating an overarching dataframe, which was done prior to commencing the loop through individual matches.
# Initialize an empty DataFrame to store all players' data players_df = pd.DataFrame(columns=['player', 'playtime', 'match_id'])
The column that records the minutes played is still in a timedelta format. To make this more interpretable, we need to convert it into minutes using the following command:
# Convert timedelta to minutes players_df['playtime_minutes'] = players_df['playtime'].apply(lambda x: x.total_seconds() / 60)
The players_df dataset includes individual entries for each player and match. To determine the total minutes a player has spent on the field, we can aggregate this data by grouping it by each player and then calculating the sum of minutes played across all matches.
players_df.groupby(['player'])['playtime_minutes'].sum().sort_values(ascending=False)
Here is a comprehensive list of all the players who represented Bayer Leverkusen during the 2023/2024 Bundesliga season, organized by the total minutes they spent on the pitch:
Lukáš Hrádecký 3215.179200 Granit Xhaka 3035.305600 Alejandro Grimaldo García 2995.236517 Jonathan Tah 2848.623300 Florian Wirtz 2529.249517 Jeremie Frimpong 2378.883150 Jonas Hofmann 2335.116433 Edmond Fayçal Tapsoba 2262.941667 Odilon Kossonou 1962.598533 Exequiel Alejandro Palacios 1950.162133 Robert Andrich 1848.811367 Victor Okoh Boniface 1653.809500 Piero Martín Hincapié Reyna 1622.655067 Josip Stanišić 1377.216550 Patrik Schick 1144.488400 Amine Adli 928.665950 Nathan Tella 854.330350 Adam Hložek 543.913800 Borja Iglesias Quintas 284.789500 Nadiem Amiri 255.476267 Arthur Augusto de Matos Soares 200.734417 Matěj Kovář 95.490000 Gustavo Adolfo Puerta Molano 70.897183 Noah Mbamba 22.786800
Throughout the course of the season, a total of 24 different players were featured. Unsurprisingly, goalkeeper Hrádecký logged the most minutes on the field, playing in every match except one where Kovář stepped in. Xhaka made appearances in 33 out of 34 games, amassing the second-highest number of minutes on the team and leading all outfield players in this regard. The collected data lays a foundation for further analysis that will be shared over the coming weeks. Stay tuned!
References
Player Performance Project - Athletic Football Development
Player Performance Project is a football high performance centre focused on holistically improving the individual athlete.
Source: Player Performance ProjectPlayer Performance Project(@playerperformanceproject)
Football High Performance, Development, Management & Solutions ⚽️ #theholisticapproach. Unit ...
Source: InstagramPlayer Performance Group | Casino Gaming Performance & Marketing ...
Player Performance Group located in Tulsa, Oklahoma is the only casino gaming performance marketing agency. We help casino properties with their performance ...
Source: Player Performance GroupPlayer Performance Definitions
A deliberate attempt to score that is on target. Includes all Goals being scored and shots on target saved by the Goalkeeper.
Source: SpreadexCricket: Team & Player Performance
Detailed performance data, cutting edge analytics, and synchronised match footage to enhance performance analysis and player recruitment across all forms of ...
Source: Stats PerformPlayer Performance Group
Player Performance Group is a Native American owned professional services firm delivering customized performance solutions for the gaming industry.
Source: LinkedInPlayer Performance Data - FIFPRO World Players' Union
Personal player data, performance data (collected during matches and training), on-the-ball data (such as goals, touches), ...
Source: FIFPro(PDF) Player Performance Model, comparison between youth ...
Professional players: • On D shows value 8% higher, with a lower S.D;. • On Drel shows a value 11% higher, with a lower S.D;. • On D_ ...
Source: ResearchGate
Discussions