The International Powerlifting Federation just released a new formula. I knew the Wilks Coefficient wasn’t perfect, but it was a measure to compare between weight classes– and I totally believe that having a bad model is better than having no model at all (ie. simply using KG total).
Now I don’t know what drama was going on between the IPF and Robert Wilks, but the IPF decided to release a new formula citing something like “it was time.” I don’t know. Like any scoring system in any sport, eventually people start strategizing to maximize the score as part of the game.
So first off let’s talk about the data set. I downloaded the complete data set from www.openpowerlifting.org. Now I believe the data there is self-submitted by meet organizers and/or scraped from the federation’s own online database. They talk about their entire process here (https://www.openpowerlifting.org/faq). For purposes of this project, I’m accepting this dataset as correct and complete.
In order to ensure comparison is apples to apples, I’m filtered the results list down to the IPF and IPF affiliated federations. A comprehensive listing can be found here (https://www.powerlifting.sport/federation/regions.html). Further, I’ve removed all additional entries with IPF affiliated federations not using current, standard IPF weight classes (eg. 82.5kg, 90kg, 100kg mens. Etc etc) as discrete weight class classification becomes important for part of my analysis later on. I’ve also limited this analysis to SBD, Raw (without wraps) only. Lastly, there were some IPF affiliate federation meets that were not drug tested. Those were also removed.
This leaves me with about 96k unique results from 1826 meets broken down M/F as follows:
Easily enough to get any sort of meaningful statistic analysis.
The Ideal Formula
To start out I think I should lay out guidelines of what I think the ideal formula for comparison between weight classes/sexes should look like. First off, as the weight class changes, the scores shouldn’t trend up or down. In other words, the average score for the heavyweights shouldn’t be higher or lower than the average score for smaller weight classes. We’re trying to find a metric to compare between them. By default, one weight class shouldn’t start out with more points (regardless of 3 lift total) than another.
Second, the distribution of scores at each bodyweight should have the same shape as all the others. If we lay out the standard “bell curve” of points in all weight classes, someone starting the at the average strength level who gains enough strength to surpass some percentage of people should get the same amount of points no matter what weight class they’re in. For example, if two lifters in different classes, who both rank in the 50th percentile, gain enough strength to move to the 30th percentile in their respective classes, should both still have the same number of points as one another.
The full Wilks Coefficient formula can be found here (https://en.wikipedia.org/wiki/Wilks_Coefficient). First off, plotting a lifters body weight and Wilks points, then running a regression against it yields a clear slope for both males and females.
There’s a clear bias towards heavier male lifters and lighter female lifters for the population. Without even looking at the distributions for the classes, it’s already flawed.
The IPF Formula is a function with 4 interchangeable constants for different sex, equipment, and lift categories. Formula as follows:
(On a side note: I totally believe a log transformation of bodyweight makes a LOT more sense than plugging it into a 5th order polynomial… )
First plotting bodyweight and IPF Points with the line of best fit yields no apparent trend (bodyweight coefficient are ~-0.009 for females and ~0.1 for males) or bias for heavier or lighter weight classes which is a great improvement over the Wilks formula.
Next the distribution of the model across weight classes need to be evaluated. In order to ensure an apples to apples comparison, it needs to be equally as easy (or difficult) to increase an individual’s points in any weight class. Before we compare the entire range, this is what the ideal cumulative step histogram looks like. Basically, it’s illustrating the % of lifters below a certain IPF Points level. For example, this shows 600 IPF Points puts someone approximately better than 80% of all lifters in both the 93kg and 105kg weight classes.
Note: Both distributions fall almost perfectly on top of one another. It is reasonable to conclude that both distributions came from the same population. This fits the ideal formula mentioned above. This is exactly what it should look like.
But now let us look at two contrasting weight classes (Note: I’m not using the 59’s and the 120+’s because of the extreme ranges in those classes– I’ll get to this later).
Notice that unlike the 93kg and 105kg classes, there is some distance between the two distributions. Looking at this graph, it’s easier to score more points while in the lower levels of strength (~below 500 points, where you have to surpass less % of people) and harder in the upper ranges of strength (~above 650 points, where you have to surpass more % of people) for the 120kg than it is for the 66kg class.
This discrepancy isn’t as bad as the Wilks formula, which heavily favored the heavier classes. Note how it is easier to score higher as a 120kg than a 66kg lifter in all strength levels.
Potential Problems With This Analysis
I chose not to compare the 59kg lifters against the 120+kg lifters because the weight class classification between them does not account for a potential 50kg bodyweight difference in the “small” 120+kg lifters and the “large” 120+kg lifters. This analysis doesn’t account for a continuous distribution change when getting into the extreme ends of human sizes.
This is a great change adopted by the IPF. The Wilks formula was very biased towards large male lifters, and small female lifters. In the middle weight classes, I think it is reasonable to use this formula to compare between weight classes as it takes the same amount of points to surpass a percentage of the population.
Philosophically, I think there might be an argument for the super heavy weights having a higher score on average. Generally speaking powerlifters don’t accidentally become super heavy weights. I think for most, it’s a deliberate choice to move the most weight possible, regardless of body weight. But eh, that narrative is above my pay grade.