Predicting the World Cup: Will predictive models choose correctly?

Written by Simone Pampuri | 15 February 2018 15:29:34 Z

Arguably, Manchester City are the sleeping giants of the Premier League. Local rivals, United, have won the League 13 times while City have only won twice. City’s victory in 2018, however, is all but pre-determined.

In late 2017, City equalled the Premier League record for consecutive wins, only to lose to Liverpool. The joke goes that Wenger sold Oxlade-Chamberlain to Liverpool with the sole intention of keeping Arsenal’s ‘invincible’ record – though the truth is a shade more complex than football banter.

While a cash boost from the Abu Dhabi government and Pep Guardiola’s master management go a long way to securing victory, statistics have also played a crucial part in City’s rise to dominance.

The changing face of football

With a staff of data scientists and expensive technology on hand, City hope to use their analytics to identify players like Raheem Sterling before they come attached to a £49 million price tag.

Statisticians use cloud technology to track coaching sessions and generate game plans, as well as monitoring fitness, GPS, sleep, and general health and conditioning.

Statistics are utilised on two levels: in the visualisation and provision of data for ‘easy’ consumption or for detailed breakdowns to determine outcomes, for example how certain players might fare in varying elements.

Statistics company Opta is widely regarded as the field leader, with clients like Sky, BBC, and IBM relying on its research. Opta collects upwards of 2,000 ‘events’ during a match – and most recently, it turned its hand to the upcoming World Cup.

Opta and predicting the World Cup

Opta’s World Cup Predictor was built from its extensive database of past international and World Cup matches – and it has chosen Brazil as its eventual winner. So how did it do it?

In short: statistical wizardry and Opta’s predictive model, which was run 50,000 times to reach its definitive output.

In Opta’s model, each team is assigned an attacking and defensive strength based on past performances and several other World Cup ‘variables’. Each game is then assigned a potential result, which in turn, simulates the entire tournament and an eventual, suspected winner.

Other variables Opta considers include ‘home advantage’: a bonus for the host nation and all the teams from the host confederation. Teams who have won the World Cup since 1970 are also given an extra advantage.

Crucially, the model doesn’t consider player-specific information. For example, if Harry Kane were to be injured, England’s chances would likely decrease – but the model wouldn’t adapt until England actively started losing.

Of course, Opta’s simulation also doesn’t take into consideration sheer human unpredictability. It would never have predicted Greece’s victory in the 2004 Euros, where Greece were outliers at 150-1 to win.

Data-led predictions and results

Defending champions, Germany, are second favourite to win with an 11.4 percent chance, while Argentina and France round out the top four.

England, who boast a strong squad of young players including Sterling, Alli and Kane, have been given a 53.9 percent chance of qualifying from their group, with only a 4.7 percent chance of making it to the final – and an even slimmer 1.9 percent of winning outright.

Morocco and Panama have been given the smallest chance of success, with 0.5 and 0.7 percent respectively. Moral of the story? Don’t bet on either of them. Or do, if you subscribe to the Greek footballing methodology of defying the odds.

The actual probability of Brazil winning is 15 percent – so it’s not nearly as clear-cut as the figures might first suggest. In 85 percent of Opta’s simulations, a team other than Brazil won – which potentially leads to a relatively open field.

Whether Opta is right or not, its study of the World Cup makes for an interesting talking-point around data. Regardless of what an organisation intends to use data for, a great model needs to take into consideration relevant variables in order to create a blueprint for success.

It’s not enough to have a great team of data scientists; you need domain experts who fully understand how the system will actually be implemented in real terms.

Inputting your data in a computer and running an algorithm isn’t a shortcut to success. Instead you’ll need the knowledge of domain experts and data scientists – and an inherent understanding of what you actually want to achieve.

As much as Opta’s algorithm has been fine-tuned, whether it’s right or not may well rely on the things it doesn’t account for: human error and unpredictable player-specific events.

Ultimately, the only way to see what’ll actually happen is to wait until June for kick-off in Moscow. And while predictive models can take some of the guesswork out of future events, Greece’s surprise 2004 victory in the Euros proves that they just can’t take away the fun.

Are you interested in creating a predictive model for your business?

If you want to harness the power of Big Data in your business, Statwolf’s data analytics service can help, with advanced online data visualisation and analysis simply running in your web browser.

We offer a range of custom services to suit your needs: advanced data analysis and modelling, custom algorithm creation, and fraud analysis.

View full post