Analysis 2023

Goals of this notebook

We’ll explore MLS Salaries through history. We’ll start with the most recent data, then look back historically. A couple of questions that come to mind:

  • Which players are getting paid the most this year?
  • Which teams have the highest salary bill this year?
  • How do team salary rankings compare over time?

Per the MLS Player’s Association, “compensation” is: The Annual Average Guaranteed Compensation (Guaranteed Comp) number includes a player’s base salary and all signing and guaranteed bonuses annualized over the term of the player’s contract, including option years.

Setup

library(tidyverse)
library(janitor)
library(scales)
library(ggrepel)
library(DT)

Import

Getting the cleaned data.

salaries <- read_rds("data-processed/mls-salaries.rds")

salaries |> glimpse()
Rows: 10,375
Columns: 9
$ year         <chr> "2007", "2007", "2007", "2007", "2007", "2007", "2007", "…
$ club_short   <chr> "CHI", "CHI", "CHI", "CHI", "CHI", "CHI", "CHI", "CHI", "…
$ last_name    <chr> "Armas", "Banner", "Barrett", "Blanco", "Brown", "Busch",…
$ first_name   <chr> "Chris", "Michael", "Chad", "Cuauhtemoc", "C.J.", "Jon", …
$ position     <chr> "M", "M", "F", "F", "D", "GK", "F", "D", "M", "D", "D", "…
$ base_salary  <dbl> 225000.0, 12900.0, 41212.5, 2492316.0, 106391.0, 58008.0,…
$ compensation <dbl> 225000.0, 12900.0, 48712.5, 2666778.0, 106391.0, 58008.0,…
$ club_long    <chr> "Chicago Fire FC", "Chicago Fire FC", "Chicago Fire FC", …
$ conference   <chr> "Eastern", "Eastern", "Eastern", "Eastern", "Eastern", "E…

Setting the most recent year of data

I’m creating an object called recent_year because at some point I’ll have new data and might want to just change the year.

recent_year <- "2023"

Players with highest salaries

Over all time

A searchable table of all players, all time.

sal_high <- salaries |> 
  arrange(compensation |> desc()) |> 
  select(!c(club_long, conference))

sal_high |> datatable()

Data takeaway: Messi money

Upon his signing on July 15, 2023, Lionel Messi became the highest paid player in the history of the MLS with a total compensation of $20.4 million. Lorenzo Insigne of Toronto was second at $15.5 million, the only other player earning more than $10 million within a year.

In most recent year

sal_high_recent <- salaries |> 
  filter(year == max(year)) |> 
  arrange(compensation |> desc()) |> 
  select(!c(club_long, conference))

sal_high_recent |> 
  filter(compensation > 4000000)

Data takeaway: Driussi 6th in 2023

For 2023, Austin FC’s Sebastián Driussi is the 6th highest player in the MLS and Houston’s Hector Herrera is the 7th highest-paid player.

Several teams have two top 10 earners on their rosters: Inter Miami, Toronto FC and the L.A. Galaxy.

Difference with just base pay in 2023?

This looks at just the base salary as opposed to total compensation. No great changes at the top of the list.

sal_high_base <- salaries |> 
  arrange(base_salary |> desc()) |> 
  select(!c(club_long, conference, compensation))

sal_high_base |> filter(year == recent_year, base_salary >= 2000000) 

Team salaries

We’ll get per-year salaries by team, then look at just this year.

Highest team salaries over time

sal_team <- salaries |> 
  group_by(year, club_long) |> 
  summarise(total_compensation = sum(compensation)) |> 
  arrange(total_compensation |> desc())
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
# peek at the top
sal_team |> head(10)

And who has been the top spending team each year?

top_sal_team_yr <- salaries |> 
  group_by(year, club_long) |> 
  summarise(total_compensation = sum(compensation)) |> 
  slice_max(total_compensation)

top_sal_team_yr |> 
  ungroup() |> 
  tabyl(club_long) |> 
  adorn_totals("row") |>
  adorn_pct_formatting() |> 
  as.tibble()
Warning: `as.tibble()` was deprecated in tibble 2.0.0.
ℹ Please use `as_tibble()` instead.
ℹ The signature and semantics have changed, see `?as_tibble`.

Data takeaways: Toronto historically spends high

Looking at the most expensive rosters in the MLS of time, Toronto FC has six of the top 10 highest entries. Over the past 17 years, Toronto has been the top spending team seven times, or 40% of the time. The L.A. Galaxy is next with five highest-spending years.

Highest salaries this year

And let’s look at this year.

sal_team_recent <- sal_team |> filter(year == recent_year)

# peek
sal_team_recent |> head(10)

Let’s round the numbers for our chart.

sal_team_recent_mil <- sal_team_recent |> 
  mutate(total_millions = (total_compensation / 1000000) |> round(1))

sal_team_recent_mil

Let’s chart this

sal_team_recent_mil_plot <- sal_team_recent_mil |> 
  filter(club_long != "Major League Soccer") |> 
  ggplot(mapping = aes(
    x = total_millions,
    y = club_long |> reorder(total_compensation)
  )) +
  geom_bar(stat='identity') +
  geom_text(aes(label = paste("$", as.character(total_millions), sep = "")), color = "white", hjust = 1.25) +
  scale_x_continuous(labels = label_dollar()) +
  labs(
    x = "Total team spending in $ millions",
    y = "",
    title = "Messi makes Miami top MLS spender in 2023",
    subtitle = str_wrap("Salaries includes each player's base salary plus all signing and guaranteed bonuses annualized over the term of the player's contract, including option years."),
    caption = "By: Christian McDonald. Source: Major League Soccer Players Association"
  ) +
    theme_minimal()


ggsave("figures/team-salary-2023.png")
Saving 7 x 5 in image

One more look to see how many high-paid players on each team.

sal_high_recent |> 
  filter(compensation >= 5000000) |> 
  count(club_short, sort = T)

Data Takeaway: Miami, Toronto tops

Given the historic signing of Lionel Messi, it is no surprise that Inter Miami have the highest team salary for the 2023 season. Toronto ranks second on the power of having two players making over $5 million, Lorenzo Insigne and Federico Bernardeschi.

Team spending over time

Let’s look at team spending over the past five years. To do this, we have to create a ranking for the spending.

  • I’m removing players not affiliated with teams
  • When I added a third column to the group because I wanted to use long names for something, the ranking broke. I had to break the group then use the .by argument for rank().
sal_team_rank <- salaries |> 
  filter(club_short != "MLS" | club_short |> is.na()) |> 
  group_by(year, club_short, club_long) |> 
  summarise(
    total_comp = sum(compensation, na.rm = TRUE)
  ) |> 
  arrange(year, total_comp |> desc()) |> 
1  ungroup() |>
2  mutate(rank = rank(-total_comp), .by = year)
1
I break the group_by here.
2
Then I set the ranking to work by year.
`summarise()` has grouped output by 'year', 'club_short'. You can override
using the `.groups` argument.
# peek
sal_team_rank |> head(20)

Visualizing all of them would be tricky. Let’s do the top five over last five years.

sal_team_rank_top <- sal_team_rank |> 
  filter(rank <= 5,
         year >= "2019")

sal_team_rank_top

Peek at this a different way

sal_team_rank_top |> 
  select(-total_comp) |> 
  pivot_wider(names_from = year, values_from = rank)

Let’s visualze spending rank

I want to use a list of team-specific colors. I’m trying this manually first, based on the results of the chart. The colors were pulled from here, though I wish I could instead join with a data package and pull from the team colors.

club_color_list <- c(
  "#80000A", # ATL
  "#7CCDEF", # CHI
  "#FEDD00", # CLB
  "#00245D", # LA
  "#C39E6D", # LAFC
  "#F7B5CD", # MIA
  "#CE0E2D", # NE
  "#5D9741", # SEA
  "#B81137" # TOR
  # "#EF3E42" # DC
  # "#00B140", # AUS
)

And now the chart

sal_team_rank_top_plot <- sal_team_rank_top |> 
  # filter()
  ggplot(aes(x=year, y=rank, group = club_short)) +
  geom_line(aes(color = club_short)) +
  geom_point(aes(color = club_short, size = 3)) +
  geom_label_repel(aes(label = club_short), size = 3) +
  scale_y_reverse() +
  scale_colour_manual(values = club_color_list) +
  scale_size_continuous(guide = "none") +
  labs(
    title = "Miami's spending was increasing before Messi",
subtitle = str_wrap("The L.A. Galaxy are the only MLS team to rank as a top five spender in each of the past five years."),
    color = "Club",
    x = NULL,
    y = "Spending Rank",
    caption = "By: Christian McDonald. Source: Major League Soccer Players Association"
  ) +
  theme_minimal() +
  guides(color = FALSE)
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
# sal_team_rank_top_plot

ggsave("figures/sal_team_rank-new.png")
Saving 7 x 5 in image

Let’s count how many times each team is in this list.

sal_team_rank_top |> 
  count(club_long, sort = T)

Data Takeaway: Miami and LA

Among the teams spending the most on their rosters over the past five years, only the L.A. Galaxy have ranked in the top five each year. Toronto FC and Inter Miami have been top spenders in four of the last five years.