top of page

What Machine Learning Can Tell Us About MLS Teams’ Roster Construction

(Information Correct As of June 3rd)

As Major League Soccer has become more popular, more and more emphasis has been placed on the issue of salary distribution under the salary cap. Other than Messi and David Beckham, one of the most famous features of MLS is the stringent salary cap that makes the league remarkably balanced. It is this salary cap that is, at least in part, responsible for no team winning the MLS Cup in consecutive years since LA Galaxy in 2011 and 2012. 

This competition makes the league more interesting for neutrals who want to see upsets and not know the outcome of the season by June, but it also makes the issue of salary construction a difficult task for front offices around the league. 

Teams around the league have all struggled with the tradeoffs of a system with so many regulations and such a tight budget to operate under. This has driven teams to often pursue very different roster construction models. This includes focusing on aging superstars, looking to South America for exciting young talent, and even trying to cultivate homegrown talent.

All of these models are potentially successful, but surely there are general changes that can be made to increase team success, right? Well, with the ever-increasing presence of Artificial Intelligence in daily life, it was only a matter of time before it was used to look at this complex issue. 

Through analysis of the results of the most successful of four Machine Learning models, my recent paper discussed the underlying roster trends in Major League Soccer. The most successful model, an Extreme Gradient Boost Model (or XGBoost for short), compiles many weaker models to create the most accurate model possible. This allows for predictions based on the small amount of data available that is still relatively accurate and can provide insights into the best practices of an MLS squad under the salary cap.

The impact of each of the statistics the models were trained on can be seen in the impact graph above. This shows that the most influential features of a roster, in terms of final ranking, all have to do with the distribution of money among positions and the salary variability. This doesn’t show how the features impact team performance, though. That is seen below.

This is called a violin graph that plots the impact of every feature on the final output (team rank) and shows it in terms of the feature’s value. This shows the importance of being deliberate about salaries by position. The defender salary feature is clearly negatively correlated with success as a better team has a lower rank. On the other hand, the highest goalkeeper salary has a clear correlation with success as this shows that having a good starting keeper is very important for doing well in the league. An ideal roster keeps the average goalkeeper salary low, though. 

Looking now at salary variability, there are also some interesting relationships in terms of the benefits of increasing salary variability. Both salary standard deviation and Inter-Quartile Range relate a higher salary variability in the roster to better performance. When considering the highest salary, while it is slightly correlated with worse team performance, this doesn’t mean having an expensive DP is necessarily bad. This relationship simply suggests that spending more money on a DP doesn’t mean they will be markedly better than a slightly less expensive signing. 

These findings are an interesting way of looking at trends in what would theoretically make a team roster more successful. This isn’t a perfect approach to the issue by any means but it allows for a better general understanding of roster construction and it is always fun to play armchair GM and argue that Brooks Lennon makes too much money. But what is potentially even more interesting than looking at this on the league-wide level is looking at specific teams.

Sorry, Walker Zimmerman

To look at team salaries we use a software called SHAP which uses a lot of complicated math and game theory to find the impact of each feature on the output. A low average Goalkeeper salary, for instance, would decrease the predicted rank meaning that the team was predicted to be more successful. 

One of the clearest examples of the importance of this idea of differing salaries by position is Nashville SC. Last year, they paid Walker Zimmerman slightly more than 2 million dollars for his services at center back and the model suggests they might have been hurt by that decision. While having a good defender is great--I’m not arguing Walker Zimmerman hurt the team in any way, I am saying that $2,056,979 is too much for a defender. The team splashing out on Joe Willis, one of the better MLS goalkeepers, was distinctly positive, though.

A more successful example of good roster construction last year was FC Cincinnati. They finished first and made a number of good roster decisions last year. They had a good, high salary IQR and standard deviation. They had a low average goalkeeper salary and did everything about as well as possible given the complications of actually applying these concepts in the real world with real players and contract negotiations.

But while looking at the reasoning behind team success last year is interesting, looking at predictions for this year is far more exciting.

Messi Breaks the System

Before breaking down some of the reasoning behind why certain teams are projected to finish where they are, I’ll just simply show you the predicted table.

Eastern Conference

Western Conference

1

Columbus Crew

1

Seattle Sounders FC

2

DC United

2

Portland Timbers

3

FC Cincinnati

3

LA Galaxy

4

Philidelphia Union

4

St. Louis SC

5

Atlanta United

5

Real Salt Lake

6

Charlotte FC

6

Colorado Rapids

7

Nashville SC

7

LAFC

8

New York City FC

8

FC Dallas

9

CF Montreal

9

Vancouver Whitecaps

10

Orlando City SC

10

Sporting KC

11

New York Red Bulls

11

Minnesota United

12

New England Revs

12

San Jose

13

Chicago Fire

13

Austin FC

14

Toronto FC

14

Houston Dynamo

15

Inter Miami

*All results are based on rankings relative to each other. ie. predicted rank of 6 or 9 could be 7th lowest so it would rank 7th either way.

Looking now to specific team predictions for 2024, Messi continues to break the league. He falls under the category of superstar whose impact can’t be numerically described, or at least in stats applicable to all teams. In addition to bringing a level of skill bordering on magic, Messi also affects who the team can sign. By effectively creating a Barcelona Legends team, David Beckham guaranteed he could sign any young South American player for cheaper than anyone else outside of the top European teams. And who can blame the kids, I’d also want to play with Messi, Suarez, Alba, and Busquets. This unique benefit is why Miami was predicted last despite currently sitting in first (although Cincinnati has a game in hand). 

The prediction can’t account for the benefits Messi brings even at his incredible salary. Jordi Alba’s 1.5 million dollar salary would also normally be a poor decision, but this team is the anomaly to end all anomalies.

Looking at a more typical squad, Columbus has been underperforming their very lofty preseason expectations. Coming off a championship, the Crew certainly would have been hoping for better than their current 7th in the East. But if it is any hope, their roster suggests they still have the potential to succeed as the model predicted them to finish high with a generally very well-put-together team.

Looking to the West, the Rapids have to be pleased with their current string of performances after finishing last in the Western Conference last season. Currently in 8th, they have benefitted from a good salary variability and having a good starting goalkeeper although their average goalkeeper salary would ideally be lower.

This all goes to show that there are many paths to a successful roster, as well as many pitfalls for the front office to avoid. The salary cap makes sure that the focus can’t be how much you spend but how wisely you spend it and that always makes for interesting discussions about team rosters.


5 views1 comment

Recent Posts

See All

1 Comment


This is so cool! So... if you're Garth.... what do you do?

Like
bottom of page