How forecasts are computed
Last updated: 2025-08-30 10:43
HOW FORECASTS ARE COMPUTED Team strength ratings At the heart of the club soccer forecasts are team strengths ratings, which are our best estimate of a team’s overall strength. In our system, every team has an offensive rating that represents the number of goals it would be expected to score against an average team on a neutral field, and a defensive rating that represents the number of goals it would be expected to concede. Given the strength for any two teams, the model projects the result of a match between them in a variety of formats — such as a league match, a home-and-away tie or a cup final — as well as simulate whole seasons to arrive at the probability each team will win the league, qualify for the Champions League or be relegated to a lower division. As a season plays out, a team’s ratings are adjusted after every match based on its performance in that match and the strength of its opponent. Unlike with Elo-based rating systems used in several other sports, a team’s rating doesn’t necessarily improve whenever it wins a match; if it performs worse than the model expected, its ratings can decline. Global team ranking is calibrated so that the best team in any given season of our dataset has a score of 1000 and other teams scores are scaled from there. League strengths To compare teams strengths for European games and to make a global team ranking, we have to answer the question: how more complicated is it to have an expectation of +1 goal in this league rather than in another league. To assess the relative strength of domestic leagues, we use recent matches played between teams from different leagues, supplemented with league market values from Transfermarkt, to assign a strength rating to every league for which we have data. To generate these league strength ratings, we’ve set up a system where we first assume that all leagues are of equal strength and determine how far above or below expectation each league has performed over the past five years. In order, we: 1/ Run through all domestic matches in history and calculate domestic team strength ratings throughout time. 2/ Look at each inter-league match from the past five years and calculate the expected score of the match based purely on each team’s domestic rating at the time. 3/ Take the difference between our expected score of the match and the actual score and run these results through Massey’s Method to find a rating for each league, expressed in how many goals better or worse than the global average that league is. 4/ Regress these calculated ratings toward market-value based ratings, weighted by how many inter-league matches we have for each league. 5/ Run through all matches in history one more time, incorporating league strengths into the predictions for any inter-league matches to improve the final team ratings. After going through that process, our league strengths can be interpreted as a bonus (in goals) given to each team in an inter-league match. Match performances Soccer is a tricky sport to model because there are so few goals scored in each match. The final scoreline will fairly often disagree with many people’s impressions of the quality of each team’s play, and the low-scoring nature of the sport will sometimes lead to prolonged periods of luck, where a team may be getting good results despite playing poorly (or vice versa). To mitigate this randomness, and better estimate each team’s underlying quality of play, our model (obviously) uses the actual score and adjust it using monte carlo simulations on expected goals to evaluate a team’s performance after each match and adjusts the team strengths of the two teams accordingly. Forecasting matches Given two teams’ ratings, the process for generating win/loss/draw probabilities for a given match is four-fold: 1/ We use the teams rating to calculate the number of goals that we expect each team to score during the match. These projected match scores represent the number of goals that each team would need to score to keep its offensive rating exactly the same as it was going into the match, 2/ We make two adjustments: one for a home-field advantage and one that takes into account the teams goal scoring efficiency (if a team has an extreme efficiency, a mean-reverting process adjusts the number of goals expected in this game) 3/ Using our projected match scores and the assumption that goal-scoring in soccer follows a Poisson process, which is essentially a way to model random events at a known rate, we generate two Poisson distributions around those scores. These give us the likelihood that each team will score no goals, one goal, two goals, etc. 4/ We take the two Poisson distributions and turn them into a matrix of all possible match scores, from which we can calculate the likelihood of a win, loss or draw for each team. To avoid undercounting draws, we increase the corresponding probabilities in the matrix to reflect the actual incidence of draws in a given competition (Dixon-Coles adjustment). Forecasting seasons Once we have probabilities for every match to be played in the competition, we then run Monte Carlo simulations to play out each league’s season 1,000 times using those forecasts. We then run Monte Carlo simulations “hot,” meaning that instead of a team’s ratings remaining static within each simulated season, the ratings rise or fall after each simulated match the team plays. In effect, this widens the distribution of possible outcomes by allowing a weak team to go on a winning streak and increase its ratings substantially, or providing for the possibility that a strong team loses its first next games of a simulated season and is penalized accordingly.