No April Fool's post for me. My prank was not posting through February and March. I got all you folks that kept checking in!
I am bringing back a regular post I have done in years past, though it has been several years now. Every Friday I aim to deliver MLB projected standings with a bit of commentary. It is a fun post to put together and a nice way for me to stay in touch with happenings around baseball. I hope it is at least as much fun to read. Today's the day to roll out the first projection because next Friday we will have the first update with actual games in the books!
The Model
(skip if you want to get straight to the preseason predictions)
My model takes a team's existing record and tacks on wins and losses based on a projection for how they will do the rest of the season. So, in this preseason run of the projections, the wins and losses are based entirely on the projection. However, at the end of the season, the model will be completely based on a team's existing record because there will be no remaining games to project. So, over the course of the season, the projection becomes gradually less and less weighted.
Projecting a team's remaining games is driven by two major components: how good the team is and how good their opponents are. Estimates of team talent are taken from Fangraph's rest of season (RoS) WAR projections for teams. WAR is the most encompassing stat of all skills available, so I use it in the model.
To convert WAR into an expected win percentage, I first find the average WAR total for a team in baseball and treat that as a .500 winning percentage. This should make sense as a perfectly average WAR would suggest a perfectly average team that wins and loses the exact same amount of games. Then I compare a team's actual WAR total to the average WAR total. If a team's WAR total is, say, 10 larger than the average WAR, then we would expect that team to be 10 wins above .500. Each team's WAR total is compared to the average WAR in this manner until we obtain expected winning percentage for every team.
WAR aims to eliminate context as much as possible, but for projecting records the context is important. For instance, the Cubs on paper appear to be a really good team, so they would be expected to win more games than an average team. However, not only do the Cubs have the good fortune of being better than many (or maybe all) of their opponents, they also never have to face themselves. In general, this means that good teams not only should perform better against average competition, but they also tend to play "softer" schedules because their schedules do not include themselves. Similarly, bad teams often play "tougher" schedules because they do not get to play themselves.
My model incorporates the strength of divisional opponents, the interleague division a team faces, and then the strength of opponents in the other divisions intraleague proportionally to the number of games a team plays each of these major groups within their schedule. The strength of these groups is determined by the projected WAR total of teams, based on the process mentioned before. It should be noted that WAR totals will change as we get more data on individual players during the season, and so this model indirectly incorporates breakout performances, sudden declines, significant injuries, and major trades.
Finally, a team's expected winning percentage is increased or decreased proportionally according to the strength of their remaining schedule. So, if a team is expected to face exactly .500 opponents, then their expected win percentage does not move at all. However, if they face weaker opponents, then their winning percentage is bumped up. Similarly, stronger opponents nudge it down. All of these conversions are done proportionally so that the system remains closed - in other words, the structure of the model guarantees that summing all of the wins and losses together will come up with an exact .500 record for all of baseball. This is important since every game must have both a winner and a loser.
Once a team has a projected win percentage it is multiplied by the number of games remaining in their schedule to figure out how many projected wins remain. Those are tacked on to the games they have already won to obtain their projected win total. The projected loss total is simply 162 (all the games in a season) minus their projected win total.
This is the one time I will dive in with details about how the model works. In future weeks I will get straight to the results. Transparency with the model is important though, and I even see several places I could dig further into how the model works. Anyway...
MLB Projected Standings, Preseason
AL WEST | AL CENTRAL | AL EAST |
Astros, 85-77, 0 GB | Indians, 85-77, 0 GB | Red Sox, 84-78, 0 GB |
Mariners, 80-82, 5 GB | White Sox, 81-81, 4 GB | Yankees, 83-79, 1 GB |
Angels, 80-82, 5 GB | Tigers, 80-82, 5 GB | Blue Jays, 82-80, 2 GB |
Rangers, 79-83, 6 GB | Twins, 80-82, 5 GB | Rays, 80-82, 4 GB |
Athletics, 79-83, 6 GB | Royals, 79-83, 6 GB | Orioles, 80-82, 4 GB |
. |
NL WEST | NL CENTRAL | NL EAST |
Dodgers, 89-73, 0 GB | Cubs, 89-73, 0 GB | Mets, 88-74, 0 GB |
Giants, 85-77, 4 GB | Cardinals, 84-78, 5 GB | Nationals, 86-76, 2 GB |
Diamondbacks, 79-83, 10 GB | Pirates, 83-79, 6 GB | Marlins, 80-82, 8 GB |
Padres, 76-86, 13 GB | Reds, 77-85, 12 GB | Braves, 73-89, 15 GB |
Rockies, 76-86, 13 GB | Brewers, 75-87, 14 GB | Phillies, 72-90, 16 GB |
Wild card play-in games: Blue Jays at Yankees, Giants at Nationals
ALDS match-ups: play-in vs. Indians, Red Sox vs. Astros
NLDS match-ups: play-in vs. Cubs, Mets vs. Dodgers
Some musings:
- Preseason projections when using WAR always look a little bunched up. It's a result of regression to the mean. Basically by definition playoff teams exceed expectations in statistical models.
- With that said, the American League looks especially bunched up again this year. If that was simply a result of the projection system than the NL would be similarly bunched, but it isn't. The AL East in particular looks incredibly wide open.
- The National League has a stratified look with rather clear "first" and "second" divisions. Only the Marlins and Diamondbacks do not fall neatly into one of the two categories.
- The Astros, Indians, Dodgers, and Cubs should all be considered legitimate favorites to win their divisions. None of them are so much beyond their divisional opponents to say they are prohibitive favorites though.
- The Indians being frontrunners should probably be flagged with an asterisk. The Royals and WAR do not get along. They have looked like a middling team on paper the last two years but have back-to-back World Series with a championship to show for it. It's possible that the Royals are lucky, but the gap between their actual results and projections is noteworthy, and has been for two years now. Consider me skeptical of their projection this year too.
Next Friday we will find out how much movement only a few games make, even ones supposedly less meaningful early in the season.