library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.3.1
library(janitor)
options(dplyr.summarise.inform = FALSE)
This notebook is a work in progress. There are no particular findings at this time.
Does spending more on players ensure a better season? I want to compare season points to salaries spent by MLS teams.
I thought I had found some standings data on kaggle, but it only goes through part of 2021.
I worked it out using 2020 data, but there is nothing to really show.
I scraped the MLS 2023 standings and put them in a Google Sheet where I hand-edited some columns to clean it up. (It was just quicker because of some non-consisten patterns in the scrapped team name.)
I haven’t used this as of yet.
mls_standings <- read_csv("data-raw/all_tables.csv") |> clean_names()
mls_salaries <- read_rds("data-processed/mls-salaries.rds")
mls_standings_2023 <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQwM9NKGZEc5CKVR2z7IoEUq0nij77xNu6kz8Rua4JFNfqt0sOe5RaE8gwW2lZ0nSsAFH5aCNUQJrcV/pub?output=csv")
Starting with salaries data since this is “cleaned” already.
Checking what we have for standings data.
There are some issues here with team names. There are notes on the names and some names are different than what is used with MLS Salaries.
std_20_clean <- mls_standings |>
filter(year == "2020", conference == "Overall") |>
mutate(
team_clean = str_remove_all(team, regex("\\[.\\]|\\(.\\)")) |> str_trim(),
team_new = case_match(
team_clean,
"Columbus Crew SC" ~ "Columbus Crew",
"Houston Dynamo" ~ "Houston Dynamo FC",
"Los Angeles FC" ~ "LAFC",
"Vancouver Whitecaps FC" ~ "Vancouver Whitecaps",
.default = team_clean
)
) |>
arrange(team_new) |>
distinct(team_new, pts)
std_20_clean
ggplot(combined_20) +
aes(x = pts, y = team_salary) +
geom_point(aes(color = conference), size = 4) +
# geom_text(aes(label = club_long)) +
ggrepel::geom_text_repel(aes(label = club_long), size = 3)
There isn’t one, really. At least not for this year. It’s also difficult to read.