In Search of the Greatest World Cup Goal Scorer – Part III: Introducing the Goal Importance Factor

So, let’s get down to the nitty gritty: So far, we have done rather straightforward things like averaging and adjusting the tallies so they become comparable over time. But every goal still had the same value within a tournament and no goal was discarded. Well, that’s about to change! I already mentioned in the previous post that in order to get things right we also have to consider the order and magnitude of the goal. And that is where I came up with the goal importance factor (IF).

Full disclosure here, this factor is inspired by a statistic I read in “The Hockey News” where the idea was to devalue NHL goal scorers who get their goals (and assists) at times when the game was already decided. And I adapted it here to soccer and it is basically determined as follows:

  • a goal counts for 1pt if it is a go-ahead goal or a game-tying goal.
  • a goal counts for 0.5pts if it puts the scoring team up by two goals or reduces the deficit to 1 goal.
  • any other goal counts for 0pts!
  • If it is an own goal, the corresponding negative points are awarded.

Here are a few examples from World Cup finals that should illustrate the idea, but also will showcase some weakness of it:

Well, the focus on older World Cup finals is simply because these had a lot more goals and thus it is easier to demonstrate the computation. Now, I like how goals that run-up the score do not count (1998 final), but the points given are not entirely perfect. The goals making the score 3-1 in 1938 and 1958 should count more than 0.5 pts since they basically were the winning goal (more on that in the next post). Also, the second and third goals in 1954 also are arguably undervalued. In particular, the third goal putting Germany back into contention. Now, I could have made an adjustment and have these goals count fully, but  this would make for a much harder implementation (admittedly not a great reason, but hey it is my time spent here 😉 ) and less consistency. In future posts, we will try to address these issues in a different way, but to be honest I have thought a lot about the IF and came to the conclusion that I like it the way it is for its simplicity.  Of course, to fairly compare over time, we also have to adjust the IF by the goal value, GV, from the previous post and compute the importance factor value as

IFV = IF*GV

And with this, the updated scoring lists would look like this:

Name Goals IF IFV Rank IF
1 GER Miroslav Klose 16 10.5 10.678 1
2 BRA Ronaldo 15 9.5 9.539 2
3 GER Gerd Müller 14 9.5 8.615 3
4 ITA Paolo Rossi 9 8.5 8.031 4
5 GER Jürgen Klinsmann 11 8.0 7.849 5
6 ITA Roberto Baggio 9 7.5 7.361 8
7 POL Grzegorz Lato 10 8.0 7.358 5
8 ESP David Villa 9 7.0 7.141 9
9 ITA Christian Vieri 9 7.0 6.927 9
10 ENG Gary Lineker 10 7.0 6.862 9

Well, I gotta give it to Klose. This was my first real attempt at toppling him from the top spot and yet, he is still there! He obviously scored many vital goals and even though his score is reduced to just above 65% of his goal tally (that is actually slightly above Ronaldo), Miroslav Klose remains in the top spot. That percentage, btw, will be called the importance ratio, IR, and is computed as IFR = IF/Goals = IFV/GV. I guess, my personal dislike for Miroslav Klose being in top spot is less based on his feats at the World Cup but rather the fact that at club level, he was never really playing at the highest level. But then, only the World Cup is being considered here and he clearly made his mark. So hats off to Miroslav Klose! We will have to dig more!

So, with my new found respect for Miroslav Klose, what else do we observe: Well, Ronaldo’s 15 goals are not really more impressive than Klose’s 16 – at least for now – but scoring twice in a World Cup Final is probably the one edge he holds over his successor as top scorer. Gerd Müller has the same IF as Ronaldo, but gets hammered on GV. Other than that, we suddenly have a strong Italian presence with Rossi, Baggio and Vieri entering the top 10. It kind of validates my feeling that Italy frequently had scorers of vital goals. Still, the best scorers still seem to come from Germany with Klinsmann rounding out the top 5.

From now on, I will also present only the top 10 after adjusting for GV. But just for completeness, here are the players in the top 10 of IF, that did not make the top 10 in IFV:
Just Fontaine (8.0), Vavá (7.0), Helmut Rahn (7.0) and, of course, Pelé (7.0). Also, getting back to IFR. At first it seems as a nice measure of goal scoring prowess: what percentage of all your goals are actually important. And looking at the top 10 above, Paolo Rossi’s IFR of 0.944 is super impressive, while all players in the top 3 have IFRs of below (0.68). Pelé is even at 0.583! So, should we consider them lower? Maybe, but most importantly, it is definitely easier to have high IFR if you score less goals. In fact, there are 596 players with IFR=1.0 and none have more than 6 goals overall. So, as a prolific goalscorer, you are bound to score a few not so important goals. And while the top 5 in IFV are all World Cup winners, only one World Cup winner (Romário) has IFR 1.0 and at least 5 goals, while 477 players have IFR=1.0 and only one goal. So, to me this statistic is imperfect and I find IFV a better value to describe “greatness” but still one we will try to improve on.

So, how about taking the average, as we have done before? Will we get Oleg Salenko off his throne? Before revealing this list, just a little note: while my writing might be interpreted as me not liking certain players, I want to note that if I sound disparaging, I may not like their name to be considered as the best goal scorer ever. However, their individual achievement should never be tainted or questioned. I just want a good hard look at every goal tally! So, here are the average IF values, the IFAs:

Name GA IF IFV IFA
1 FRA Just Fontaine 2.167 8.0 5.890 0.982
2 ITA Salvatore Schillaci 0.857 6.0 5.901 0.843
3 RUS Oleg Salenko 2.000 2.5 2.457 0.819
4 POR Eusébio 1.500 5.5 4.755 0.793
5 SUI Josef Hügi 2.000 3.5 2.353 0.784
6 BRA Leônidas 1.600 6.5 3.915 0.783
7 ITA Christian Vieri 1.000 7.0 6.927 0.770
8 NIR Peter McParland 1.000 5.0 3.681 0.736
9 DEN Jon Dahl Tomasson 0.833 4.0 4.022 0.670
10 GER Gerd Müller 1.077 9.5 8.615 0.663

Again, I find myself liking the pure IFVs better than the IFAs – even though as a professional statistician, I should love averages more than totals! The players in the top 5 all played only a single World Cup and only Gerd Müller played in a World Cup final – and he is 10th. I agree that Just Fontaine should always be highly rated among great goal scorers simply by netting 13 in a single World Cup, but both IF and IFV are not all that high. Salvatore Schillaci probably had the most amazing World Cup goal scoring run ever and I do value him highly, but not necessarily in the top 3. And Oleg Salenko had super unique feat by scoring 5 in a single game, but then again, his Russian team did not make it past the group stage despite all his scoring efforts. Eusébio and Leônidas would have strong cases for higher positions, but I do not regard them as highly as say, Gerd Müller, who was top scorer at the 1970 World up and scored the game winners in a de-facto semifinal and the final in 1974.

Well, the IF is a first step, but it is obviously not flawless as already mentioned above. It also disregards another factor and one that might be Miroslav Klose’s and Oleg Salenko’s undoing: scoring game winning goals and scoring goals in the latter stages of the World Cup. So, in the next post, we will try to expand on the importance factor concept.

In Search of the Greatest World Cup Goal Scorer – Part II: Adjusting for Word Cup Goal Average

As mentioned in the previous post, goals scored are not entirely comparable. While today’s players get more games, the games played in the early years of the World Cup had more goals. So, in a way it was easier for Just Fontaine to rack up 13 goals in just 6 games, when in 1958 there was an average of 3.6 goals/game. Here is a quick visualization of the World Cup Goal Average (WGA) over the years:

Wordl Cup Goal Average

As we can see from that graph, there were a lot of goals before 1960 and since there is a small but steady decline. Maybe things are pointing up again after Brazil 2014, but I doubt it. The solid line is a LOESS trend-line which should take out all random variation. As we can see, the huge outlier 1954 is taken as such, while most values are scattered closely around the line. Of note is also that this way, the 1990 World Cup is really seen as an abnormality. Whether this is actually the case is a good point for discussion: prior to 1992 a goalie could handle a back pass from his own players, which allowed a defense to manage a lead much better. But also, the weather in 1994 was a lot more demanding which often leads to more goals. So, let’s see which is better, the raw data or the smoothed line.

So, how do we adjust now? Well, first of all we need a reference point to compare the value of a goal of say 1950 to say 2010. To me it makes most sense to put everything in reference to the current (i.e. most recent) standing and that would be the 2014 World Cup. Let’s stick with raw data at first and we have a reference WGA of 2,67 goals/game. Now goals scored in tournaments with a higher WGA should count less, while goals scored in tournaments with lower WGA should count more. To achieve this, we divide the reference WGA, WGAref, by the individual tournament WGAs, WGAt, and compute the tournament goal values as

GVt = WGAref/WGAt.

Here is the full list of these goal values (based on raw data as well as LOESS smoothing) with reference 2014 (i.e. a goal at the 2014 World Cup has value 1.00):

Year Raw Avg Raw GV LOESS LOESS GV
1930 3.889 0.687 4.049 0.618
1934 4.118 0.649 4.118 0.608
1938 4.667 0.573 4.156 0.602
1950 4.000 0.668 3.999 0.626
1954 5.385 0.496 3.724 0.672
1958 3.600 0.742 3.400 0.736
1962 2.781 0.961 3.107 0.806
1966 2.781 0.961 2.895 0.865
1970 2.969 0.900 2.768 0.904
1974 2.553 1.047 2.741 0.913
1978 2.684 0.995 2.695 0.929
1982 2.808 0.952 2.626 0.953
1986 2.538 1.053 2.565 0.976
1990 2.212 1.208 2.545 0.984
1994 2.712 0.985 2.547 0.983
1998 2.672 1.000 2.566 0.976
2002 2.516 1.062 2.494 1.004
2006 2.297 1.163 2.421 1.034
2010 2.266 1.179 2.454 1.020
2014 2.672 1.000 2.503 1.000

Now, how do we use these GVs? Well, let’s look at Miroslav Klose as an example. In 2002 and 2006 he scored 5 goals each, in 2010 it was 4 goals and an in 2014 2 more goals. So, his time adjusted goal tally using raw data is

Gadj= 5*1.062 + 5*1.163 + 4*1.179 + 2*1.000 = 17.844.

So, his 16 goals in 4 tournaments from 2002 to 2014 are the equivalent of 17.844 goals in 2014. Given that Klose scored most of his goals in tournaments with low WGA, it was obvious that his score would get slightly inflated. We can of course do the same for the smoothed WGAs and arrive at an time adjusted smoothed goal tally of 16.268 for Klose. With this method it is of course hard to see how Klose could be unseeded from his top spot in the overall scorer standings:

Name Goals Gadj Gadj,LOESS Rank LOESS
1 GER Miroslav Klose 16 17.844 16.268 1
2 BRA Ronaldo 15 15.987 15.033 2
3 GER Gerd Müller 14 13.187 12.698 3
4 GER Jürgen Klinsmann 11 11.551 10.791 4
5 ENG Gary Lineker 10 11.148 9.790 7
6 GER Thomas Müller 10 10.897 10.100 5
7 ESP David Villa 9 10.386 9.202 11
8 POL Grzegorz Lato 10 10.269 9.205 10
9 ARG Gabriel Batistuta 10 10.004 9.812 6
10 BRA Pelé 12 9.974 9.706 8
11 FRA Just Fontaine 13 9.648 9.571 9

OK, that surely did not improve things compared to the unadjusted table. Adjusting for goal value demotes the players from yesteryear and promotes players that played more recently – especially in 1990. Fontaine’s incredible 13 goal tally is now worth a lot less, while Lineker and Klinsmann get quite the boost. If we use the smoothed version, the Top 4 remain the same, but Pelé and Fontaine do not drop as much. Kudos to Gerd Müller who as a more senior player steadfastly remains in the Top 3.

So, how about adjusting for games played in addition which gives us the players adjusted goal average, GAadj.

Name GA GAadj GAadj,LOESS Rank LOESS
1 RUS Oleg Salenko 2.000 1.971 1.965 1
2 FRA Just Fontaine 2.167 1.608 1.595 2
3 POR Eusébio 1.500 1.441 1.297 5
4 ARG Guillermo Stábile 2.000 1.374 1.236 6
5 CZE Tomáš Skuhravý 1.000 1.208 0.984 9
6 COL James Rodriguez 1.200 1.200 1.200 7
7 HUN Sándor Kocsis 2.200 1.092 1.479 3
8 ITA Salvatore Schillaci 0.857 1.036 0.843 12
9 ITA Christian Vieri 1.000 1.028 0.988 8
10 GER Gerd Müller 1.077 1.014 0.977 10
11 SUI Josef Hügi 2.000 0.992 1.344 4
12 BRA Leônidas 1.600 0.931 0.965 5

Hmmmmmmmm … There are elements of this list that I like (Fontaine, Eusebio and Stábile high up, good mixture between older and newer players) , but two things totally discredit it to me: the Bomber is only in 10th place for both rankings. If it comes to pure goal scoring ability, I think one would be hard pressed to find any better player. So, a #10 ranking just does not look right. And then of course the player on top! Thinking about it, it was obvious that Oleg Salenko would rank very high: a high GA in only a single World Cup, which also has a relatively high GV. And while I could see an argument for Salenko having the best single game goal scoring achievement in his 5 goal game against Cameroon (if we forget the shambolic defense and poor motivation of the Africans), I truly have a hard time crowning Salenko as the best World Cup goal scorer of all time. In addition to running up the score on an inferior opponent, these goals also came in a game with nothing to play for.

There are also smaller things, that I do not like. In particular, Skuhravý is ranked higher than Schillaci. Both played in the same tournament (Italia 1990), but Skuhravý’s goals came in tow games: 2 in a 5-1 rout of the USA and 3 in a 4-1 rout of Costa Rica in the second round. Compare that to Schillaci, who scored in 6 different games and 4 of these were game winners and 1 the go-ahead goal in the semi-final! In every regard, Schillaci’s performance was more impressive.

Now, comparing the GAs based on raw data and LOESS smoothing, I do like the raw data better as is mixes the different World Cup periods slightly better. But again, I think that overall both adjusted lists do not satisfy me.

Also, the problem with some players racking up goals against weak opponents, while others consistently score important goals is also present in the previous ranking of adjusted total goals. And while it is easy to knock on Salenko, the same argument can be applied to certified super striker: Gabriel Batistuta. 10 goals surely look impressive, but 3 of these came against a weak Greece team in 1994 and 3 more were add-on goals against a similarly out of sorts Jamaica. That leaves 4 more goals, two of which were penalties in the second round. Batigol was one of my favorite players of the late 90s, but this is not very impressive. Compare this to Eusébio’s 9 goals, where Portugal had to play in the group of death with Brazil, Hungary and Bulgaria, Eusébio had to bring back his team from 0-3 deficit against the pesky North Koreans and then lead Portugal to a third place finish with 2 more goals …

So, while I still think that adjusting for tournament GV is a good and necessary step, there is still some way to go to find a satisfying list taking care of the problems inherent in both of the above. So, to improve the listing, I come to three conclusions:

  1. We have to consider not only the fact that a goal was scored, but also how important that goal was. Scoring the third to fifth goals in a 5-0 rout is a nice feat, but all these goals were not important in securing the win. Conversely, scoring three game winning goals in three consecutive games is a highly impressive streak. Also, scoring the game winner in the World Cup final is more important/valuable than doing so in the group stage.
  2. Relative performance is a nice measure, but I do start to prefer the absolute performance. While on a very small scale, the Skuhravý vs. Schillaci comparison highlights the shortcomings of relative performance. If we also take importance into account, I think a player scoring many important goals over a larger number of games should be considered a greater goal scorer than one scoring two game winners in two games.
  3. The smoothed curve was a nice idea, but the raw data so far led to more satisfying results. I will keep comparing the two adjustments methods, but as of now advantage raw data

Well, the journey has just begun and I hope to get you deeper into World Cup goal scoring history and some statistics in the next post.