So, there are already too many ideas in my head and one of them is that the series on the greatest World Cup goal scorer did not look finished to me. Over my Easter break I had a few more thoughts on this topic and I think, I came up with something that is an even better ranking than the previous one. But then, I was too much in jersey mode as these posts typically are easier and quicker to write. Also, I found many nice ones to look at. At the moment, though, I see that many jerseys are released but I feel the information on shirt lettering is still insufficient and for my life I don’t know why hardly anybody can find a proper England away jersey. This little pause enables me to get to a (potentially) final installment of the goal scoring series – to be fair, I think every great series has seven parts! 😉 And as I have said, there are more World Cup specific posts in my head already.
The last ranking of the World Cup’s greatest goal scorers left me a bit dissatisfied. Especially the non-averaged one. Yes, I feel I can defend it, but there surely is a way to improve here. And my argument for improvement is along the following lines: not only should we take into account whether a goal was scored past the first stage of the tournament, but there surely is also more value to be put on goals scored if the team is deep in a tournament. Or, simply put, a goal should count for more if less teams are still in contention for the Cup. Sounds weird? Well, the mathematician in me is always looking for short, concise and precise statements, but they often are unfortunately not always the easiest way to grasp an idea. Sorry about that! Let me spell it out: a goal in the World Cup final should count more than a goal in the semi-finals, which in turn should count more than a goal scored in the quarter finals etc. You get the idea and hopefully what I said above makes more sense now. 🙂
So, how do we go about it? Well, I think a multiplicative factor should be used here. Taking the current tournament format, I am thinking that a goal in the final should count twice as much as a goal in the group stage and we can use linear increments for the stages in between. This results in the following stage factors:
- 1.00 for the group stage (32 teams in contention)
- 1.25 for the round of 16 (16 teams in contention)
- 1.50 for the quarterfinal (8 teams in contention)
- 1.75 for the semifinal and the third place playoff (4 teams in contention and one medal game)
- 2.00 for the World Cup final (2 teams in contention)
So far, so good, but it does not answer two things: a) how does this translate to tournaments prior to 1998, where the 32 team format was adopted, and b) how can we apply this factor? Let’s tackle these issues in order:
I think the solution for the 16 team tournaments is relatively easy: Determine the number of teams in contention and assign the factor accordingly. Given that the 1930 and 1950 tournaments were also in essence 16 team tournaments, we can apply the same logic there, too. So, that covers all tournaments from 1930 to 1978, but we need to run through a few special cases:
- 1930: Although the first round consisted of only 13 teams, I would still consider it as a full group stage and not raise the stage factor any further. Therefore all games at the group stage get stage factor 1.25.
- 1950: The group stage here will be handled the same way as in 1930. And, as already outlined, the final round concluded in two quasi-finals: one for third place one for the Cup. Now, since only four teams were in contention at that stage, we do not need to separate the Sweden-Spain game from the first four second round games. However, Brazil-Uruguay is a quasi-final and should count as such and have stage factor 2.00.
- 1954: There were two playoff games for final spots in the quarterfinals. Now, this is a bit tricky, but I would still count them as part of the group stage and thus not raise the value of the stage factor.
- 1958: Again, there were three playoff games for spots in the quarterfinals and again, I will consider them as part of the group stage.
- 1974: Here a second group stage was introduced for the quarterfinal round and all games should count as such. I do however make two exceptions for Netherlands-Brazil and Germany-Poland as these were quasi-semifinals and count them as such.
- 1978: Same format, same treatment. Also, without any quasi-semifinals, no need to raise the importance of any game.
That leaves us with four 24-team tournaments. 1986-1994 are rather straightforward except for the initial group stage. But then again, the only difference to the current group stage is that there are less teams, but in essence we can treat them like the group stage of a 32-team tournament and give these games a stage factor of 1.00. And that leaves the eternal special case of 1982, whose format I really would have liked to see adopted at EURO 2016. The first round again will have stage factor 1.00 and most of the games in the second group stage can count like a round of 16 (although it is a round of 12) and use stage factor 1.25. But then, there were three quasi-quarterfinals: France-Northern Ireland, Germany-England and Italy-Brazil. And all of these will count as such and get a stage factor of 1.50. I think it all makes sense – I hope you agree with me. And yes, we could have averaged etc, but I am not sure if it would have been entirely fair.
That leads us to the second issue of how do we use the stage factor. Well, I am in favor of keeping it simple and use for each goal its GV4 from the previous post and multiply it by the stage factor. To keep the value comparable to goals scored and call the resulting value GV5 (yes, I am pulling these names from my behind, kind of). This way we take into account all the factors we have discussed so far: goal value, importance factor, game winning goal, elimination game and stage of tournament. Applying this algorithm to the goal data and taking the sum of the GV5s, we get the following ranking:
*this is just the sum of the stage factors for each player.
Now, this list I like much better! I guess any list that includes Pelé has a certain air of legitimacy around it. But what I really like here is not only the order (I was surprised that Paolo Rossi is ranking that high, but then just look at his 1982 World Cup), but also that includes players with careers as far back as the 1950s (and even two of them!). Ronaldo is profiting from the fact that he has scored two goals in a World Cup final, while this fact does count against Miroslav Klose. Vavá, like Pelé is also profiting from that fact. And, above them all, Gerd Müller enjoys a very comfortable lead. The more I look at it, the more I come to the conclusion that this is indeed an ultimate list.
So, one thing is left: averaging GV5 according to games played, which we call (of course) GA5. Here it is:
Here, I am also quite happy with the adjustment as the only player I had a problem with (Skuhravý) is not on this list anymore. Clearly, scoring a hat-trick in the 1966 World Cup final was bound to help Geoff Hurst, but then if you do so, you deserve being high up the list. It is also remarkable that there are four players that made both lists: Salvatore Schillaci, Gerd Müller, Paolo Rossi and Vavá. Coincidentally, these are also the four players with the most games among the GA5 Top 10. Lastly, note that while the previous list had no player from before 1958, this one has no player from after 1990. So, taking averages tends to benefit players from further back in the day. And then, the most recent one towers comfortably over all of them. Salvatore Schillaci truly was a phenomenon in 1990. Even more so, as he never again lived up to the hype he had generated that summer in his home country.
Now, let’s conclude this post and briefly discuss which one should be considered the ultimate. In a way, I do like both of them, but my gut feeling tells me that GV5 holds more water than GA5. I said it before and I say it again: as a statistician averages are always preferred over sums. But in this case, I feel more comfortable with using sums as they emphasize a bit more on longevity. Both lists contain super stars, great strikers, but also one-trick ponies, that only did it at one tournament and whose club career did not match their World Cup achievement. But, I feel more at home with GV5.
Now, if you wanted to extract the essence from both lists, the four players on both losts would remain in the discussion for greatest all-time goal scorer. How could we resolve that. One way would be to average their ranking on both lists. In this case, Schillaci just edges out Müller at 3.5 vs. 5. With considerable distance Paolo Rossi is third with 6.5 ahead of Vavá with 8. Case closed? Just consider the following: if you were to pick a striker for your squad. Would you rather take Salvatore Schillaci or Gerd Müller? If I am guaranteed the 1990 form, then maybe the former. But considering the entire track record, you would be a fool not to go with Müller. So, my emphasis stays with the GV5-ranking and Gerd Müller as best striker of all time! And given their status in the world game during their time, I am quite happy for having Ronaldo and Paolo Rossi on the podium. Now, the case is closed!
P.S.: While I truly enjoyed gong through the goal scoring history of the World Cup and digging out all these players from the past, one particularly remarkable (if not satisfying) fact remains: the discussion did not even ONCE mention the two most prominent players of our time. No, in World Cup history, both Lionel Messi and Cristiano Ronaldo are also-rans with relatively poor goal scoring records. In fact, the two are weaker versions of Miroslav Klose and Gabriel Batistuta, respectively. And given that both are currently in the discussion of best ever, I hold their World Cup record strongly against them. Will Russia re-write the tale for one of them?