When we look at offensive leaderboards, we can see the best and worst offenses, but we can’t tell which teams are similar. Likewise, in the context of the other 29 teams, we don’t know how a team plays comparatively to the rest based on a determined number of variables. Sure, we can sort by wRC+, OPS, or even WAR, and that will show us the best offenses based on production. However, what if we didn’t want to use production alone and focused more on variables that describe a team’s play style? Within this article, we will attempt to understand how teams perform offensively and identify which teams are similar to one another based on their play style rather than offensive production. The way we will be doing this is using a machine learning algorithm called k-means clustering.
K-means clustering, in a nutshell, attempts to partition data into identifiable groups. It does this by going through various iterations of randomly placing a number of specified centroids within the dataset. This process will have each cluster minimize the distance from other clusters to create their own group. This is an unsupervised machine learning algorithm which means there are no groups for the algorithm to base the cluster on, so the number of groups must be chosen manually.
The data I used for this article is from FanGraphs Team Offense leaderboards. Luckily, FanGraphs has an option to remove pitchers from the teams’ stats, so no pitchers will be included. The variables I will be using include ISO, Walk %, Strikeout %, GB.FB, Swing %, Contact %, SwStr % (swing and miss %). I chose these variables because I wanted to get a true picture of each team’s characteristics that rely on the factors of defense per se or which describes production. ISO is probably the one variable you can quibble with, given that doubles, triples, and home runs rely somewhat on defense positioning/skill (only triples and doubles). In contrast, home runs are affected by park dimensions. It is also very much a production metric given it describes how much power a team has; however, I wanted to use it given that power is such a staple in this game, and I felt ISO did the best.
Before I get into the findings, I need to mention how I came up with the number of clusters I used. Using the Silouhette method, I came up with the output below:
The plot indicates that 8 clusters are the optimal number of clusters. Therefore using eight as the number of clusters within the k-means algorithm, it created the groups below:
Cluster 1: Houston
Houston has surprisingly (probably not) been given their own group. Houston is elite in Contact %, Strikeout %, and Swing and Miss %. While guys like Carlos Correa, Yuli Gurriel, Jose Altuve, and Alex Bregman have been the centerpieces of the Houston lineup who pretty much live by this high contact / low K %, Kyle Tucker has entered this group as well. Kyle Tucker has improved his Strikeout % from 20.9% in 2020 to 15.9% this year while also improving his SwStr % and Contact %. The four listed above and Tucker all have sub 10% SwStr %, and Myles Straw (who was traded to Cleveland) is still tied for the lowest SwStr % on the team. Their ISO is just above average, but when you have the highest batting average in the league, it stands to bring it down, given their Slug % is the 4th highest. Overall, Houston has had one of the scarier lineups in the league, and it will continue that way into the postseason, and it makes sense why they find themselves with their own cluster.
Cluster 2: Colorado, Los Angeles Angels, New York Mets, Cleveland, Baltimore, Kansas City, Texas
This group walks at a meager rate, and power is lacking, while plate discipline is below average at best. These teams tend to swing a lot more than most teams, and the result is not good when they do. When we look at the teams within these groups, there are no surprises. Kansas City, Texas, the Angels, Baltimore, and Cleveland have the lowest walk % in the league. Colorado comes in at 19th and the Mets 17th. When we look at the swing % leaderboards, the Royals are the highest (30th), with the Mets 28th, the Rockies 26th, the Orioles 24th, the Rangers 21st, the Angels 20th, and the Indians 19th. These teams tend to be middle of the pack when it comes to contact % and SwStr %, with the Rockies leading the way with the 11th highest contact % in the league and the Angels with the 14th lowest SwStr %. The only significant differences within these groups are the ISO numbers: Colorado 12th, Cleveland 13th, Baltimore 20th, the Angels 21st, the Mets 25th, Kansas City 27th, and Texas 29th.
Colorado feels, in terms of production, a lot better than the other teams within this group. The Coors effect could be the reason, but in terms of the non-description stats, they fit this group more than any other. The Angels haven’t had Mike Trout since May, Anthony Rendon has been out since the beginning of July sporting a 96 wRC+ in the games he did play, and Shohei Ohtani has cooled off since the All-Star break. Since then, the Angels have relied upon a combination of rookies and utility players to fill the holes, which has not been ideal. The Mets have the highest Walk % within this group but have the 22nd best Hard-Hit % in the league, the 9th highest GB rate, and the 2nd highest Swing %. The Mets do a bunch of swinging, but when they make contact, it’s in the ground and isn’t hit very hard, which is not a recipe for success.
There are no real surprises here with Cleveland, Texas, and Kansas City. These offenses just haven’t been good at all. Cleveland and Texas traded their stars over the last year in Lindor and Gallo, respectively. The Royals traded Jorge Soler right before the trade deadline, and Salvador Perez is currently toe-to-toe with Vladimir Guerrero Jr. and Shohei Ohtani for the home run lead, but other than that, they have been bad as a whole. Kansas City has the 5th worst wRC+ while Texas has the worst in the league.
The one metric these teams are primarily at the top in is stolen bases. Yes, fun! The Royals have the most stolen bases in the league so far at 117, while Cleveland and Texas come in at 4th and 5th. Comparing these teams with Fangraphs Spd metric tells the same story. Cleveland and Kansas City are tied for 1st, while Texas comes in tied for 7th. While these teams have not been that good to watch when they swing the bat, they have been running around and stealing bases and providing a lot of action on the basepaths. While that is fun to watch, they find themselves in a group describing one of the worst offensive groups in the league.
Cluster 3: Toronto, Oakland, St. Louis
This cluster is basically the flyball plus low strikeout % combination. These three teams are all about making contact, and when they do, they would prefer to hit it in the air. The teams this group consists of include Toronto, Oakland, and St. Louis. In theory, this would be very difficult to accomplish given that when players have higher power numbers, they would tend to have higher strikeout rates. However, this is not the case for one team in particular. The Blue Jays, who I would argue has been one of the most fun teams to watch offensively, have the 2nd highest ISO and highest Slug % in the league while having the 2nd lowest strikeout %. While that is great for Toronto, the power has not followed suit for Oakland (15th in ISO) and St. Louis (17th). One explanation for this is that Toronto has the 4th highest Hard-Hit % while Oakland and St. Louis find themselves 19th and 20th. While all three of these teams have similar description metrics, the production hasn’t been much more realized by Toronto because they can hit the ball hard.
Cluster 4: Chicago White Sox, Cincinnati, Minnesota, Philadelphia, Milwaukee, Arizona, Seattle
This is a group whose overall characteristics are tough to identify. It is another large cluster group that includes the White Sox, Reds, Twins, Phillies, Brewers, Diamondbacks, and Mariners. This is a group that isn’t drastically favoring one metric when compared to the other groups. This group includes teams that walk more than most, but most of the centers are in the middle, but I would think this is the result of a considerable variation between most groups. Their plate discipline centers are around the middle, with Seattle being the worst of the bunch at 27th and the Phillies at 11th lowest, which is the best in the group. I believe this grouping results from these teams not being similar to other groups, so they all got lumped together here.
Cluster 5: Boston, Atlanta, Tampa Bay
This group lacks plate discipline and contact but hits the ball in the air for some power. This group describes the “juice ball” era play style. These three teams are in the top five of ISO, top eight of SwStr %, and, in terms of hitting fly balls, they range from 5th to 9th highest. The Strikeout % does tend to vary with these teams, as Tampa Bay seems to follow the notion that strikeouts are OK with enough home runs since they have the 5th highest Strikeout % in the league, with the Braves coming in at 11th and the Red Sox 19th.
It’s surprising to see Boston in this group because they were only ranked 14th in the league in ISO in 2020. However, the top three in plate appearances for the Red Sox last year (Rafael Devers, JD Martinez, Xander Bogaerts) all improved their ISO in 2021. Rafael Devers went from .230 to .255, JD Martinez from .175 to .232, and Xander Bogaerts from .202 to .212. They also signed Enrique Hernandez, who also improved his ISO (from .180 to .202), and Hunter Renfroe, who also brings power to their lineup. Those are many improvements within the Red Sox players, and it is not a by-product of the environment, given that the league average ISO has come down from .173 in 2020 to .167 in 2021. It’s apparent they sought to bring power to this lineup, and they have done it without giving in to the strikeouts. This is all surprising until you realize who the Red Sox currently have running the front office as GM – Chaim Bloom. His previous employer? You guessed it…the Tampa Bay Rays.
While Boston and Tampa Bay have both improved their ISO numbers, Atlants has been at the top of the ISO leaderboards these last two years. While they did lose Ronald Acuna Jr. back in mid-July, they did acquire Adam Duvall and Jorge Soler at the trade deadline to help mitigate the loss of Acuna and maintain their ISO rankings.
Cluster 6: Chicago Cubs, Detroit, Miami
This is probably the least ideal group you would want to be a part of. The Cubs, Tigers, and Marlins are pretty bad at just about everything right now. They don’t walk or make much contact, and when there is contact, it’s usually on the ground – no power and a complete lack of discipline at the plate. In addition, Chicago traded Anthony Rizzo, Kris Bryant, and Javier Baez at the trade deadline and traded Joc Pederson to Atlanta in mid-July. As a result, if this analysis had been done right before the trade deadline, the Cubs would have probably found themselves in a different group, but that is not the case.
Akil Baddoo has been an excellent young Rule 5 draft pickup for Detroit, currently sporting a 110 wRC+. Jonathan Schoop has played well enough to earn him a two-year extension that was signed this August, and Jeimer Candelario has a 120 wRC+ through 141 games. They have been a pretty decent offense to watch, but they just wholly lack plate discipline, finding themselves with the highest SwStr % and Strikeout % in the league.
While Miami is 23rd in wRC+, they find themselves with the 3rd worst ISO in the league, and dealing Starling Marte and Adam Duvall at the trade deadline surely didn’t help. While Detroit has the worst Strikeout % in the league, Miami follows suit with 2nd worst. However, the one thing that the Marlins do well is run. They have the third-most stolen bases in the league. Looking at the Spd metric with players with at least 400 plate appearances, Miami had two of the three fastest players in the league, with the now-departed Starling Marte second and Jazz Chisholm third.
Cluster 7: Washington, San Diego, Pittsburgh
This is another surprising cluster because these teams would not be expected to be clustered together. But they are. Washington, San Diego, and Pittsburgh are all in this group. These teams are among the top 5 in Contact %, SwStr %, and Strikeout %. However, the problem here is they all find themselves within the top 7 of ground ball % (Washington 1st, San Diego 6th, Pittsburgh 7th) and are in the bottom half of the league in terms of ISO (Washington 18th, San Diego 22nd, Pittsburgh 30th).
The Pirates have Bryan Reynolds, who has been their star player and has the highest ISO on the team (.217) and a wRC+ at 136. There’s not much to write home about after that. Ben Gamel has the next highest ISO on the team at .154 (among all Pirates who have at least 250 plate appearances). The Nationals have their star in Juan Soto, and also Josh Bell, but they did lose Trea Turner and Kyle Schwarber at the trade deadline. The one evident surprise here is the Padres. They had the 3rd highest ISO in 2020 but have fallen off since, and while Washington and Pittsburgh are not in a competitive window, the Padres are, which is even more surprising. A possible explanation could be their Hard-Hit %, which fell from 2nd in 2020 to 13th in 2021, and their home run per flyball %, which fell from 4th in 2020 to 23rd in 2021. The small sample may have helped the Padres in 2020, and it has perhaps been corrected this year.
Cluster 8: San Francisco, Los Angeles Dodgers, New York Yankees
The final cluster includes the Giants, Dodgers, and Yankees. This group is very patient at the plate, with the lowest Swing % and highest Walk %. However, they hit the ball in the air, and those fly balls do damage, given their high ISO numbers (Dodgers 1st, Yankees 4th, Giants 7th). They have some of the lowest SwStr % in the league but will still strike out at times.
Of course, it would not be 2021 without the Dodgers and Giants finding themselves in the same group here. They have been neck and neck in the NL West all season, and it’s not a surprise these teams are built similarly. Farhan Zaidi was the General Manager of the Dodgers from 2014 to the end of the 2018 season and then became the President of Baseball Operations of the Giants. The Giants have been a menace offensively and have been doing this with resurgent seasons from Brandon Crawford (139 wRC+), Buster Posey (140 wRC+), and Brandon Belt (150 wRC+), while hitting on some under the radar moves from players like Mike Yastrzemski, LaMonte Wade Jr., and Darin Ruf (a former Dodger who played in the KBO in 2019).
Simultaneously, this has been the play style for the Dodgers and Yankees for several years now. For example, in 2017 and 2018, they were both top 3 in walk %, with the Dodgers having the 4th highest walk % in 2019 and the Yankees having the league’s highest last year. Both teams also finished in the top 4 in ISO each season from 2018 to 2020.
And there we have it. These 8 clusters do not showcase who has been the best offenses or the best teams in general. This model does not account for the teams’ pitching and fielding abilities, which are also very important in determining whether a team is good or not. But this model is more so to describe how these offenses have played based on a determined set of metrics and determine which teams resemble each other. It does highlight some various traits that playoff teams possess. We can look at Clusters 2, 6, and 7, which contain teams that will probably not have a playoff team this year, and examine the metrics that describe their groups. The metrics used could also vary within the model, but I believe variables helped accomplish the goal of this article while not going overboard with too many metrics. Regardless, this was a fun way to compare all the teams right before we start the postseason.
Featured image courtesy of @astros on Twitter.
All data is accurate through 09/21.