Unlocking Leverkusen′s Stellar 2023/2024 Season: How to Calculate ′Minutes Played′ Using StatsBomb Data


Summary

Unlock Leverkusen's stellar 2023/2024 season by mastering how to calculate 'Minutes Played' using StatsBomb data, crucial for evaluating player contributions and improving team performance. Key Points:

  • *Precise Player Evaluation*: Accurately quantify player contributions with 'Minutes Played' metrics derived from StatsBomb data. Identify key performers and areas needing improvement.
  • *Game Context Analysis*: Understand the impact of game dynamics on player performance by considering factors like formation, opposition strength, and player position for a comprehensive evaluation.
  • *Enhanced Decision-Making*: Use 'Minutes Played' data to make informed decisions regarding player selection, substitutions, and tactical adjustments to optimize team performance.
This guide helps you leverage 'Minutes Played' data from StatsBomb to evaluate players, understand game context, and enhance decision-making for Leverkusen's successful season.


Leverage StatsBomb Data: Uncover Player Contributions with Precision

The StatsBomb dataset offers a treasure trove of event data, meticulously capturing every significant action during Leverkusen's Bundesliga matches. By delving into these detailed records, we can ascertain the exact duration each player spent on the field, thereby calculating their minutes played with precision.

To achieve this, it's crucial to comprehend the hierarchical structure of the StatsBomb event data. Each recorded event is linked to a specific player, and understanding these connections allows us to accurately aggregate events for individual players. This process involves analyzing the relationships between events, players, and match timelines to ensure precise attribution of playing time. Through this methodical approach, we can reliably track and evaluate players' contributions throughout the season.
Key Points Summary
Insights & Summary
  • Player Performance Project focuses on holistic improvement of individual football athletes.
  • Located in Tulsa, Oklahoma, Player Performance Group specializes in casino gaming performance marketing.
  • In football, attempts to score that are on target include all goals and goalkeeper saves.
  • Detailed performance data and analytics are used for enhancing player analysis and recruitment.
  • Player Performance Group is Native American owned and provides customized solutions for the gaming industry.
  • Personal player data includes metrics from matches, training, and specific game actions like goals and touches.

The Player Performance Project is dedicated to improving football athletes holistically. Meanwhile, the Player Performance Group in Tulsa offers specialized performance marketing for casinos. By leveraging detailed data and analytics, both organizations aim to enhance performance—whether it`s on the field or in the gaming industry.

Extended Comparison:
CategoryPlayer Performance ProjectPlayer Performance GroupFootball Metrics
Focus AreaHolistic improvement of individual football athletes.Casino gaming performance marketing.Attempts to score that are on target.
LocationTulsa, Oklahoma--
Specialization-Customized solutions for the gaming industry.-
Ownership-Native American owned.-
Performance Data Utilized--Detailed performance data and analytics for enhancing player analysis and recruitment; includes metrics from matches, training, and specific game actions like goals and touches.

Let's dive in, focusing exclusively on these three essential libraries.
from statsbombpy import sb import pandas as pd from datetime import timedelta

We can now access the open data for the 2023/2024 Bundesliga season through the statsbombpy library. This dataset specifically includes only Bayer Leverkusen's matches. For a more comprehensive dataset, one would need to purchase it from StatsBomb.
# list of matches in that competition mat = sb.matches(competition_id=9, season_id=281)

In the release article, both the competition ID and season ID are clearly identified. The variable 'mat' encompasses 34 entries, each representing a separate match played by Leverkusen. Upon examining the initial entry, we observe certain details about the game itself; however, there is no information provided that allows us to determine the minutes played.
match_id                                    3895302 match_date                               2024-04-14 kick_off                               17:30:00.000 competition                 Germany - 1. Bundesliga season                                    2023/2024 home_team                          Bayer Leverkusen away_team                             Werder Bremen home_score                                        5 away_score                                        0 match_status                              available match_status_360                          available last_updated             2024-05-10T16:57:53.017895 last_updated_360         2024-05-10T17:03:59.613154 match_week                                       29 competition_stage                    Regular Season stadium                                    BayArena referee                                 Harm Osmers home_managers                   Xabier Alonso Olano away_managers                            Ole Werner data_version                                  1.1.0 shot_fidelity_version                             2 xy_fidelity_version                               2

To perform this calculation, we require specific event data. This data can be accessed in the following manner:
events_df = sb.competition_events( country="Germany", division="1. Bundesliga", season="2023/2024", gender="male")

The dataset, referred to as events_df, boasts an impressive 137,765 entries and features a comprehensive array of 108 columns. It's important to note that the majority of these columns are pertinent only to specific event types. Within this extensive dataset, there are 33 distinct categories of events:
Pass                 39214 Ball Receipt*        38215 Carry                32369 Pressure             11419 Ball Recovery         3222 Duel                  2022 Block                 1434 Clearance             1248 Goal Keeper           1102 Shot                   916 Dribble                913 Miscontrol             903 Dispossessed           759 Foul Committed         758 Interception           738 Foul Won               726 Dribbled Past          519 Substitution           301 50/50                  201 Half Start             136 Half End               136 Tactical Shift         127 Injury Stoppage        106 Starting XI             68 Shield                  49 Referee Ball-Drop       38 Bad Behaviour           34 Player Off              27 Player On               27 Error                   21 Offside                 11 Own Goal For             3 Own Goal Against         3

For every pass, block, shot, and other notable actions in a match, we document each instance as an individual row within our events_df dataset. However, for our present analysis, not all these events are essential. Our primary focus is identifying the participants of the game — specifically the Starting XI and substitutions made during the match. To ensure precision, we won't just assume that each game runs for a standard 90 minutes. Instead, we'll calculate the precise time each player spends on the field. For this purpose, we also need to include Half End events to accurately capture halftime durations for every game.
Finally, it's crucial to consider expulsion incidents as they also impact the total minutes played. Even though Leverkusen did not receive any red or second yellow cards throughout the season, we must still include these occurrences for the sake of methodological consistency. These expulsion events can be tracked using the foul_committed_card column, which may indicate a Yellow Card, Red Card, or Second Yellow Card.

To integrate all elements seamlessly, we filter these events accordingly:
reduced_df = events_df.loc[(     (events_df.type.isin(['Starting XI','Substitution', 'Half End'])) |      (events_df.foul_committed_card.isin(['Red Card','Second Yellow']))) &      (events_df.team_id == 904) ]

To put it simply, we filter for key event types such as Starting XI, Substitution, and Half End. Additionally, events where the foul_committed_card column shows Red Card or Second Yellow are also considered. Our focus is on Leverkusen's matches exclusively (with their team_id being 904). The timestamp column is crucial for calculating minutes played. At present, these timestamps are strings formatted like 00:06:33.549. To perform calculations with these timestamps, we first convert them into pandas timedelta type strings.

Next, by ensuring that only relevant events like Starting XI and substitutions are taken into account along with specific fouls resulting in significant cards (Red or Second Yellow), we can refine our data set to be more precise and meaningful. This meticulous filtering helps us zero in on Leverkusen's performance metrics accurately. Moreover, converting string-based timestamps into a computable format allows us to analyze the playtime effectively.

By honing in on these critical aspects—event type selection and accurate time conversion—we streamline the process of determining players' active minutes on the field during Leverkusen’s games. Such refined data handling ensures that our analysis remains both rigorous and insightful for sports analytics purposes.
reduced_df['timestamp'] = pd.to_timedelta(reduced_df['timestamp'])

While this step isn't strictly necessary, it can streamline our work with the dataset. As highlighted earlier, a large number of the 108 columns are pertinent only to certain events. In our refined dataset, we’re left with just four different event types, resulting in many columns being entirely empty across all rows. To make our dataset more manageable, we'll eliminate any columns that contain no data at all:}

{Although this next step is optional, it significantly simplifies handling the dataset. Previously mentioned was that numerous among the 108 columns are relevant solely to specific events. Now that we’ve narrowed down our dataset to four distinct event types, several columns remain vacant for all entries. To enhance clarity and ease of use, we’ll proceed by removing all completely empty columns:
# remove all columns where ALL values are None  reduced_df = reduced_df.dropna(axis=1, how='all')

This process refines the dataset, making it more suitable for detailed examination. As a result, we have streamlined the information down to just 25 columns:
reduced_df.columns  ['duration', 'foul_committed_card', 'id', 'index', 'location',        'match_id', 'minute', 'off_camera', 'period', 'play_pattern', 'player',        'player_id', 'position', 'possession', 'possession_team',        'possession_team_id', 'related_events', 'second',        'substitution_outcome', 'substitution_replacement', 'tactics', 'team',        'team_id', 'timestamp', 'type']

The sequence of events is out of order. Let's correct this before we move on to the calculations:
reduced_df = reduced_df.sort_values(by=['match_id','period','timestamp'])

There are various methods to determine the number of minutes a player has been on the field. My chosen method involves several steps: first, identify the starting lineup for each match; then, record the exact durations of both halves; calculate the total playing time accordingly and allocate this time to those in the initial lineup. If a player is substituted or sent off, their remaining playtime is adjusted based on when they left the field. To start, we need to create a loop that iterates through all matches. Recall that we initially created a variable named 'mat' which holds all match data:}

{Multiple approaches exist for calculating how long players spend on the pitch. The method I opted for includes these steps: begin by identifying each game's starting eleven; next, accurately measure both halves' lengths; compute the overall duration of gameplay and assign this period to starters. Adjustments are made if players are substituted or receive red cards by reducing their remaining game time appropriately. Initially, we craft a loop to process every match sequentially. Remember that our 'mat' variable contains all relevant match information:
# Loop through all matches of the season for match_id in mat.match_id:          # Initialize an empty dict to store the game time     players_match_dict = {}          # Reduce to events of this match     match_df = reduced_df[reduced_df.match_id == match_id]

Firstly, we determine the precise minutes played in the match. This is achieved by utilizing the two Half End events recorded during the game. Both halves commence at 00:00:00, so to find the total duration of play, we need to add together the lengths of both halves. The period column helps us distinguish between the first and second halves. Therefore, by summing up these durations, we get an accurate measure of total game time.

Next, it's essential to note that each half's duration contributes equally but separately to our calculation. By identifying when each half ends using specific timestamps provided in match data, we can accurately tally up these periods for a comprehensive overview of playtime.

Finally, understanding this breakdown allows us to analyze player performance more effectively. Accurate measurement of playing time is crucial for evaluating stamina, strategy execution, and overall team dynamics throughout both halves of the game. Thus, combining these individual segments gives us a clear picture of total active minutes on the field.
# Calculate total game time  period_1 = match_df[(match_df.type == 'Half End') & (match_df.period == 1)].iloc[0].timestamp  period_2 = match_df[(match_df.type == 'Half End') & (match_df.period == 2)].iloc[0].timestamp  total_game_time = period_1 + period_2

The Starting XI event is where the lineup for each team and match is recorded, with only one such event per team per game. By examining the initial lineup event, we find that crucial details are captured in the tactics column, formatted as JSON.
reduced_df[reduced_df.type == 'Starting XI'].iloc[0].tactics  {'formation': 442,  'lineup': [{'player': {'id': 15155, 'name': 'Janis Blaswich'},    'position': {'id': 1, 'name': 'Goalkeeper'},    'jersey_number': 21},   {'player': {'id': 8211, 'name': 'Benjamin Henrichs'},    'position': {'id': 2, 'name': 'Right Back'},    'jersey_number': 39},   {'player': {'id': 30360, 'name': 'Mohamed Simakan'},    'position': {'id': 3, 'name': 'Right Center Back'},    'jersey_number': 2},   {'player': {'id': 8509, 'name': 'Willi Orban'},    'position': {'id': 5, 'name': 'Left Center Back'},    'jersey_number': 4},   {'player': {'id': 12034, 'name': 'David Raum'},    'position': {'id': 6, 'name': 'Left Back'},    'jersey_number': 22},   {'player': {'id': 39460, 'name': 'Nicolas Seiwald'},    'position': {'id': 9, 'name': 'Right Defensive Midfield'},    'jersey_number': 13},   {'player': {'id': 8769, 'name': 'Xaver Schlager'},    'position': {'id': 11, 'name': 'Left Defensive Midfield'},    'jersey_number': 24},   {'player': {'id': 39167, 'name': 'Xavi Simons'},    'position': {'id': 12, 'name': 'Right Midfield'},    'jersey_number': 20},   {'player': {'id': 5557, 'name': 'Timo Werner'},    'position': {'id': 16, 'name': 'Left Midfield'},    'jersey_number': 11},   {'player': {'id': 16275, 'name': 'Ikoma Loïs Openda'},    'position': {'id': 22, 'name': 'Right Center Forward'},    'jersey_number': 17},   {'player': {'id': 16532, 'name': 'Daniel Olmo Carvajal'},    'position': {'id': 24, 'name': 'Left Center Forward'},    'jersey_number': 7}]}

From the JSON data, our primary focus is on extracting the player names. To achieve this, we iterate through the 'lineup' field. For each player listed, we log their total game time into a previously established dictionary.
  # Get the initial lineup   lineup = match_df[match_df.type == 'Starting XI'].iloc[0].tactics      # Initialize player in match dictionary   for player in lineup['lineup']:       players_match_dict[player['player']['name']] = total_game_time

Player Performance Evaluation: Accounting for Game Dynamics

The accurate computation of remaining game time plays a pivotal role in the analysis of player performance and team tactics. By meticulously accounting for substitutions and expulsions, it becomes possible to evaluate each player's contributions within the correct timeframe. This precise tracking ensures that assessments reflect the true impact of each player's participation.

Moreover, handling player exits and entries involves intricate complexities that demand careful consideration. Through an iterative process, we can accurately monitor player involvement without double-counting game time. This method provides a comprehensive understanding of player participation and their overall impact on the match.

By integrating these crucial elements into our analysis, we gain deeper insights into both individual performances and team strategies. This approach not only enhances the accuracy of our evaluations but also enriches our understanding of the game's dynamics.
First, we will compile all the data we've gathered for the match into a pandas dataframe. Following this, we'll append a new column to include the match_id, ensuring each row is uniquely identifiable.
# Create a DataFrame for the current match  players_match_df = pd.DataFrame(players_match_dict.items(), columns=['player', 'playtime'])  players_match_df['match_id'] = match_id

Subsequently, we consolidate the match dataframes by integrating them into a comprehensive outer dataframe.
# Append the current match DataFrame to the main DataFrame  players_df = pd.concat([players_df, players_match_df], ignore_index=True)

Before initiating the match loop, the foundational dataframe was established in this manner:}

{To set the stage for our analysis, we first constructed an outer dataframe using a specific method before entering the match loop.}

{Prior to delving into the iterative match process, a primary dataframe was generated as follows:}

{The initial step involved creating an overarching dataframe, which was done prior to commencing the loop through individual matches.
# Initialize an empty DataFrame to store all players' data players_df = pd.DataFrame(columns=['player', 'playtime', 'match_id'])

The column that records the minutes played is still in a timedelta format. To make this more interpretable, we need to convert it into minutes using the following command:
# Convert timedelta to minutes players_df['playtime_minutes'] = players_df['playtime'].apply(lambda x: x.total_seconds() / 60)

The players_df dataset includes individual entries for each player and match. To determine the total minutes a player has spent on the field, we can aggregate this data by grouping it by each player and then calculating the sum of minutes played across all matches.
players_df.groupby(['player'])['playtime_minutes'].sum().sort_values(ascending=False)

Here is a comprehensive list of all the players who represented Bayer Leverkusen during the 2023/2024 Bundesliga season, organized by the total minutes they spent on the pitch:
Lukáš Hrádecký                    3215.179200 Granit Xhaka                      3035.305600 Alejandro Grimaldo García         2995.236517 Jonathan Tah                      2848.623300 Florian Wirtz                     2529.249517 Jeremie Frimpong                  2378.883150 Jonas Hofmann                     2335.116433 Edmond Fayçal Tapsoba             2262.941667 Odilon Kossonou                   1962.598533 Exequiel Alejandro Palacios       1950.162133 Robert Andrich                    1848.811367 Victor Okoh Boniface              1653.809500 Piero Martín Hincapié Reyna       1622.655067 Josip Stanišić                    1377.216550 Patrik Schick                     1144.488400 Amine Adli                         928.665950 Nathan Tella                       854.330350 Adam Hložek                        543.913800 Borja Iglesias Quintas             284.789500 Nadiem Amiri                       255.476267 Arthur Augusto de Matos Soares     200.734417 Matěj Kovář                         95.490000 Gustavo Adolfo Puerta Molano        70.897183 Noah Mbamba                         22.786800

Throughout the course of the season, a total of 24 different players were featured. Unsurprisingly, goalkeeper Hrádecký logged the most minutes on the field, playing in every match except one where Kovář stepped in. Xhaka made appearances in 33 out of 34 games, amassing the second-highest number of minutes on the team and leading all outfield players in this regard. The collected data lays a foundation for further analysis that will be shared over the coming weeks. Stay tuned!

References

Player Performance Project - Athletic Football Development

Player Performance Project is a football high performance centre focused on holistically improving the individual athlete.

Player Performance Project(@playerperformanceproject)

Football High Performance, Development, Management & Solutions ⚽️ #theholisticapproach. Unit ...

Source: Instagram

Player Performance Group | Casino Gaming Performance & Marketing ...

Player Performance Group located in Tulsa, Oklahoma is the only casino gaming performance marketing agency. We help casino properties with their performance ...

Player Performance Definitions

A deliberate attempt to score that is on target. Includes all Goals being scored and shots on target saved by the Goalkeeper.

Source: Spreadex

Cricket: Team & Player Performance

Detailed performance data, cutting edge analytics, and synchronised match footage to enhance performance analysis and player recruitment across all forms of ...

Source: Stats Perform

Player Performance Group

Player Performance Group is a Native American owned professional services firm delivering customized performance solutions for the gaming industry.

Source: LinkedIn

Player Performance Data - FIFPRO World Players' Union

Personal player data, performance data (collected during matches and training), on-the-ball data (such as goals, touches), ...

Source: FIFPro

(PDF) Player Performance Model, comparison between youth ...

Professional players: • On D shows value 8% higher, with a lower S.D;. • On Drel shows a value 11% higher, with a lower S.D;. • On D_ ...

Source: ResearchGate

D.S.

Experts

Discussions

❖ Columns