.net
All site revenue goes to charity

About Elo Ratings

Ratings were introduced in December 2020. We use a customized version of the Elo rating model widely used in chess and sporting events for predicting outcomes between two competitors. The most unique aspect of this implementation is that freecell players don’t compete head-to-head over a freecell game, so our model handles this indirectly by rating games as well as players. In other words, each specific deal is assigned a rating, and players and games exchange points based on how expected or unexpected the outcome was. In this way players compete against each other using the games as proxy.

How it works

Only streak play is rated. Separate ratings have been developed for tournament and HotStreak play. Rankings are for current players only. Names are removed from published rankings after 14 days of inactivity. While most players favor the standard 8x4 game, the site offers a vast array of variants, each with its own lists of best streaks. Elo ratings bring this all together with their ability to account for differences in difficulty between variants and levels, to provide a single ranking for overall player performance. The idea here is that we can now start to compare streak players across variants and difficulty levels, and we can now compare game variants as well (see chart below).

The ratings are designed to answer a simple question: given a particular deal in streak play, how likely is this player to win? The premise of the Elo model is that we can quantify this likelihood based on the difference in ratings between game and player, and then use the actual outcome to improve our prediction for the next event.

Average Elos for each variant

 
4x4
5: 2684  
6: 2688  
7: 2737  
8: 2815  
9: 2900  
10: 2928  
11: 3037  
12: 3114  
4x5
5: 2298  
6: 2383  
7: 2482  
8: 2531  
9: 2623  
10: 2664  
11: 2734  
12: 2816  
4x6
5: 2113  
6: 2142  
7: 2197  
8: 2272  
9: 2347  
10: 2431  
11: 2506  
12: 2594  
4x7
5: 2055  
6: 2057  
7: 2098  
8: 2135  
9: 2210  
10: 2288  
11: 2360  
12: 2452  
4x8
5: 1803  
6: 1839  
7: 1937  
8: 2044  
9: 2077  
10: 2208  
11: 2304  
12: 2336  
4x9
5: 1632  
6: 1771  
7: 2001  
8: 2100  
9: 2182  
10: 2351  
11: 2423  
12: 2412  
4x10
5: 1474  
6: 1569  
7: 1749  
8: 1865  
9: 1946  
10: 2058  
11: 2185  
12: 2236  
 
5x3
5: 2412  
6: 2446  
7: 2521  
8: 2596  
9: 2671  
10: 2747  
11: 2822  
12: 2896  
5x4
5: 2168  
6: 2182  
7: 2290  
8: 2354  
9: 2422  
10: 2508  
11: 2587  
12: 2673  
5x5
5: 2102  
6: 2111  
7: 2129  
8: 2162  
9: 2208  
10: 2314  
11: 2407  
12: 2520  
5x6
5: 1837  
6: 1945  
7: 2013  
8: 2074  
9: 2160  
10: 2238  
11: 2301  
12: 2396  
5x7
5: 1649  
6: 1668  
7: 1679  
8: 1727  
9: 1823  
10: 1885  
11: 2087  
12: 2082  
5x8
5: 1363  
6: 1516  
7: 1606  
8: 1721  
9: 1827  
10: 1909  
11: 1994  
12: 2037  
5x9
5: 1219  
6: 1307  
7: 1437  
8: 1515  
9: 1558  
10: 1646  
11: 1774  
12: 1840  
5x10
5: 1082  
6: 1082  
7: 1176  
8: 1280  
9: 1290  
10: 1261  
11: 1426  
12: 1496  
 
6x2
5: 2405  
6: 2226  
7: 2390  
8: 2472  
9: 2540  
10: 2629  
11: 2690  
12: 2767  
6x3
5: 2157  
6: 2145  
7: 2131  
8: 2168  
9: 2250  
10: 2334  
11: 2421  
12: 2517  
6x4
5: 1981  
6: 1975  
7: 2073  
8: 2200  
9: 2300  
10: 2362  
11: 2477  
12: 2568  
6x5
5: 1548  
6: 1606  
7: 1746  
8: 1854  
9: 1985  
10: 2080  
11: 2198  
12: 2182  
6x6
5: 1289  
6: 1355  
7: 1534  
8: 1664  
9: 1742  
10: 1835  
11: 1865  
12: 1883  
6x7
5: 1235  
6: 1243  
7: 1376  
8: 1488  
9: 1554  
10: 1583  
11: 1665  
12: 1701  
6x8
5: 1052  
6: 1070  
7: 1137  
8: 1227  
9: 1328  
10: 1298  
11: 1477  
12: 1535  
6x9
5: 899  
6: 858  
7: 874  
8: 854  
9: 856  
10: 819  
11: 1185  
12: 1277  
6x10
5: 897  
6: 818  
7: 808  
8: 812  
9: 832  
10: 590  
11: 645  
12: 734  
 
7x1
5: 2332  
6: 2362  
7: 2390  
8: 2409  
9: 2473  
10: 2558  
11: 2634  
12: 2710  
7x2
5: 2032  
6: 2081  
7: 2197  
8: 2349  
9: 2415  
10: 2505  
11: 2531  
12: 2587  
7x3
5: 1744  
6: 1747  
7: 1941  
8: 2052  
9: 2139  
10: 2222  
11: 2311  
12: 2308  
7x4
5: 1367  
6: 1446  
7: 1567  
8: 1728  
9: 1773  
10: 1810  
11: 1870  
12: 1868  
7x5
5: 1175  
6: 1184  
7: 1335  
8: 1426  
9: 1517  
10: 1568  
11: 1665  
12: 1698  
7x6
5: 1051  
6: 1024  
7: 1195  
8: 1295  
9: 1371  
10: 1388  
11: 1514  
12: 1558  
7x7
5: 875  
6: 884  
7: 926  
8: 955  
9: 1007  
10: 801  
11: 853  
12: 915  
7x8
5: 850  
6: 780  
7: 760  
8: 794  
9: 780  
10: 663  
11: 903  
12: 937  
7x9
5: 811  
6: 697  
7: 645  
8: 664  
9: 683  
10: 414  
11: 496  
12: 497  
 
8x0
5: 2421  
6: 2452  
7: 2458  
8: 2523  
9: 2598  
10: 2672  
11: 2754  
12: 2824  
8x1
5: 2060  
6: 2099  
7: 2141  
8: 2197  
9: 2291  
10: 2361  
11: 2428  
12: 2496  
8x2
5: 1695  
6: 1704  
7: 1903  
8: 2015  
9: 2107  
10: 2203  
11: 2259  
12: 2261  
8x3
5: 1272  
6: 1358  
7: 1520  
8: 1658  
9: 1769  
10: 1796  
11: 1863  
12: 1884  
8x4
5: 1031  
6: 1011  
7: 1060  
8: 1085  
9: 1122  
10: 1125  
11: 1310  
12: 1331  
8x5
5: 942  
6: 937  
7: 982  
8: 1049  
9: 1105  
10: 972  
11: 1128  
12: 1184  
8x6
5: 798  
6: 780  
7: 798  
8: 840  
9: 878  
10: 646  
11: 1009  
12: 1076  
8x7
5: 753  
6: 764  
7: 777  
8: 796  
9: 746  
10: 424  
11: 527  
12: 723  
8x8
5: 730  
6: 678  
7: 683  
8: 678  
9: 661  
10: 410  
11: 569  
12: 545  
 
9x0
5: 2268  
6: 2236  
7: 2429  
8: 2476  
9: 2470  
10: 2547  
11: 2547  
12: 2585  
9x1
5: 1745  
6: 1771  
7: 1794  
8: 1948  
9: 2084  
10: 2193  
11: 2184  
12: 2292  
9x2
5: 1266  
6: 1319  
7: 1449  
8: 1578  
9: 1682  
10: 1717  
11: 1842  
12: 1841  
9x3
5: 1045  
6: 1039  
7: 1147  
8: 1235  
9: 1316  
10: 1311  
11: 1484  
12: 1516  
9x4
5: 962  
6: 943  
7: 955  
8: 967  
9: 1002  
10: 923  
11: 1049  
12: 1132  
9x5
5: 661  
6: 618  
7: 619  
8: 635  
9: 652  
10: 609  
11: 889  
12: 1038  
9x6
5: 701  
6: 612  
7: 639  
8: 670  
9: 689  
10: 449  
11: 714  
12: 823  
9x7
5: 764  
6: 724  
7: 725  
8: 762  
9: 736  
10: 431  
11: 410  
12: 659  
 
10x0
5: 1991  
6: 2130  
7: 2266  
8: 2347  
9: 2403  
10: 2351  
11: 2482  
12: 2406  
10x1
5: 1361  
6: 1447  
7: 1506  
8: 1673  
9: 1728  
10: 1762  
11: 1868  
12: 1922  
10x2
5: 1006  
6: 1025  
7: 1151  
8: 1238  
9: 1353  
10: 1305  
11: 1429  
12: 1504  
10x3
5: 810  
6: 812  
7: 886  
8: 943  
9: 996  
10: 784  
11: 1035  
12: 1099  
10x4
5: 791  
6: 747  
7: 765  
8: 791  
9: 821  
10: 430  
11: 692  
12: 712  
10x5
5: 713  
6: 655  
7: 648  
8: 649  
9: 706  
10: 422  
11: 691  
12: 766  
10x6
5: 564  
6: 559  
7: 567  
8: 572  
9: 585  
10: 376  
11: 560  
12: 679  
 
11x0
5: 1669  
6: 1785  
7: 1939  
8: 2034  
9: 2129  
10: 2126  
11: 2122  
12: 2139  
11x1
5: 1003  
6: 1030  
7: 1110  
8: 1190  
9: 1276  
10: 1300  
11: 1433  
12: 1461  
11x2
5: 793  
6: 835  
7: 897  
8: 935  
9: 973  
10: 743  
11: 1029  
12: 1079  
11x3
5: 710  
6: 729  
7: 757  
8: 788  
9: 820  
10: 498  
11: 582  
12: 584  
11x4
5: 642  
6: 627  
7: 561  
8: 623  
9: 525  
10: 401  
11: 528  
12: 601  
11x5
5: 638  
6: 557  
7: 556  
8: 589  
9: 514  
10: 390  
11: 400  
12: 400  
 
12x0
5: 1308  
6: 1339  
7: 1547  
8: 1657  
9: 1742  
10: 1840  
11: 1862  
12: 1864  
12x1
5: 674  
6: 670  
7: 714  
8: 742  
9: 805  
10: 704  
11: 994  
12: 1029  
12x2
5: 687  
6: 685  
7: 708  
8: 735  
9: 769  
10: 387  
11: 653  
12: 579  
12x3
5: 631  
6: 549  
7: 461  
8: 520  
9: 508  
10: 386  
11: 387  
12: 387  
12x4
5: 574  
6: 449  
7: 454  
8: 581  
9: 536  
10: 392  
11: 398  
12: 558  
 
13x0
5: 1003  
6: 1012  
7: 1038  
8: 1068  
9: 1128  
10: 1033  
11: 1331  
12: 1397  
13x1
5: 625  
6: 622  
7: 661  
8: 690  
9: 720  
10: 478  
11: 846  
12: 881  
13x2
5: 470  
6: 467  
7: 481  
8: 483  
9: 494  
10: 434  
11: 433  
12: 448  
13x3
5: 466  
6: 471  
7: 468  
8: 496  
9: 489  
10: 313  
11: 357  
12: 342  

Getting started

Every new player begins with a rating of 1500, a common starting point in Elo systems. Games could have also been assigned 1500 to start, but this would have ignored information we already have about individual games and the set they belong to. We also have far more games than players to rate, thousands of players versus millions of games, so a better starting point was needed.

To do this, the win/loss record of each game in December 2020 was used to assign an initial rating. Note that this was a one-time event, and game stats no longer play any role in Elo ratings. Game stats differ in important ways from ratings, because we don’t know who played those games and we don’t know if it was streak play or a timed event. But as an historical note here’s how ratings were assigned.

Basically we took the game's play history, adjusted it slightly toward the mean for that level, and then assigned ratings straight from the Elo formula that corresponds with that win%, assuming a 1500-level opponent. Ratings for level 6–12 were scaled up based on a calculation of how a player's rating increases after winning ten games in the level below. And finally whole-level adjustments were made in almost every level based on play testing to bring them into parity with each other, and now the Elo model is continuing to fine tune things.

As an example of how ratings were assigned, let’s say a 7x4-5 game had been beaten 1 time out of 10 plays, for a 10% player win rate. Before assigning its initial rating we adjusted to account for the fact that we don't know that much about a game after ten plays. For instance one more win would have doubled its win%, which is significant.

So since the cumulative player win% for 7x4-5 is 64% we add five more fictitious plays at the average win rate for this set of games, meaning we pretend it was beaten 3.2 times out of the next 5, giving an adjusted win rate of 4.2 wins out of 15 plays = 28%. In other words the fewer plays a game has the more we assume it’s a typical game for that variant and rate it accordingly.

If the same game showed 10 wins out of 100 plays, it has the same win% but now we know more about it. This time adding the 5-game adjustment has much less impact, and the game is rated as 13.2/105=12.6%. The 1 out of 10 game would be rated 1668 and the 10 out of 100 game would be rated 1837. This method was used for all levels 5 through 12 where streak play is possible, with an additional bump in the ratings of games beyond level 5 to account for the presumably higher average rating of the players there. New games being played for the first time are assigned the average rating for their respective variant and difficulty level.

The math

Elo ratings represent the likelihood that a player will win or lose a particular game. The formula for expected win% is the inverse of 1 + 10**((game rating minus player rating) divided by 400). So if a 1500-rated player is dealt a 1000-rated 8x4 game, we would say there’s a 1 / (1 + 10**((1000-1500) / 400)) = 94.68% chance the player wins and only a 5.32% chance she loses.

These percentages also define how ratings adjust based on the actual outcome. We use a constant K of 8 points, which is the max point exchange between player and game. If the result was expected, and the player above wins, her rating increases by 5.32% of 8 or 0.43 points. If she loses, her rating decreases by 94.68% of 8 or 7.57 points. Points gained by a player are taken by the game, and vice versa. So her new rating will be 1500.43 if she wins or 1492.43 if she loses.

That’s the whole story in terms of the player ratings. There’s more going on behind the scenes though when it comes to the games, as we needed to create some leverage to balance the impact on players and games. We do this by taking a few hundredths of every point gain or loss on an individual game and applying it to all 32,768 games in that variant/difficulty level. In other words all the games in 8x3-8 get a small boost up or down based on what happens to any individual 8x3-8 that gets played. This gives us years’ worth of adjustment in days, which is not too much given how many more games there are than players. This extra “boost,” either up or down, is scaled to the frequency of play for that level so as long as a variant gets some play we’re able to get enough adjustment to bring it in line with the others.

What does a rating mean?

Elo ratings are a self-correcting predictive tool and not a score. If this were a head-to-head competition like chess, a 200-point difference means the higher rated player would expect to win 76% of the time. Some top players are 600 or more points above the starting rating, meaning they'd expect to outsolve an average player 97% of the time.

A rating is also focused on recent performance. You can think of it like a thermometer: it’s always adjusting based on the current temperature. The previous temperature is the starting point, but once it moves it doesn’t remember the old reading. A rating provides an interesting measure of overall solving ability, but may frustrate players who make it their primary focus. Ideally you check out your rating to see how you stack up and to be amazed at the talented field of players we have here, and then go back to running up streaks in your favorite variants.

Individual game ratings are only an approximation, and except perhaps in 8x4 will never reach their true level. That’s fine, as long as the average for the whole level reaches its true rating, since presumably players will face a large sample of games and some will be rated too high and others too low. Also, at this point the ratings don’t know the difference between really hard games and unwinnable games, so variants with lots of unwinnables will tend to have higher average ratings to compensate.

On the player side, ratings reach their true level much faster. To get there fastest some may opt for what a chess player might call “sharp” play, choosing variants with ratings close to their own where something, good or bad, is bound to happen. Opponents with close ratings push apart like magnets with like charges. Others will choose to protect a rating by only playing specific variants.

Eventually it won’t matter where you play because the variants will naturally move toward parity with each other. And since player ratings are set relative to game ratings, over time it will become impossible to maintain a rating built on play in specific variants that were previously overrated. In the mean time, if you want to know that your rating is an accurate representation of your ability, the best bet is to play in a variety of variants and difficulty levels. This has the added benefit of speeding along the process of getting all the variants into parity with each other. Feel free to look for variants you feel are overrated though, your play will help bring them in line.

Strategy

There’s nothing you have to do to improve your rating, except play better obviously. Good and bad streaks will happen, and it’s normal to see a rating fluctuate even by dozens of points if you play a lot. Note that if a player wins exactly the number of games predicted by their rating during a day the rating will be unchanged. If you lose one more game than expected your rating will drop by 8 points. Players are human and deals are random. Performance can vary by a lot more than one game, even if the ratings were perfect. So if you get down, keep playing. Ratings have no memory, they’re free floating and not held back by previous performance.

One point of caution, Elo ratings do not care if this is the first game of your streak or the hundredth, so play every game like it matters and don’t let your guard down on those early ones. Also, where Winnable versions of a variant exist it’s marginally preferable to play these over the regular version of the same variant where you might risk losing points to a game that other players won’t have to face. This difference is minor and transient, since most unwinnable games in these variants have been assigned very high ratings already and any points they take you’ll begin to get back with your next game, but this may help you add a few Elo points.

How we got here

Before he passed away SlowPoker (part of the original Ratings Crew) imagined devising a rating system for streak play here. He wanted to use the Elo system but he wanted to give each game a rating, sort of a man against machine approach. So basically each game would develop its own rating over time as would each player. These ratings represent the fruition of that idea. After the initial launch extensive play testing was done and manual adjustments made to the games level by level. Then more adjustments were made based on anomalies players uncovered, and finally the “secret sauce” part of the algorithm was fully implemented to let the machine do the work of boosting game averages up or down. We continue to monitor the adjustments the model is making to game averages, and it’s working very well.

Keep branching out, everyone. Play those odd variants and higher difficulty levels if you aren’t worried about protecting a streak. It all helps. Don’t worry, none of you are breaking the rating system. If you choose to play up instead of starting at level 5 that actually helps us get some coverage in lesser played games. Just know that if a rating is built on games that seem to be rated too high, you'll find that playing anything else will bring it back down.

Frequently Asked Questions

Can a player improve their rating by winning lots and lots of easy games?
If a 1900-level player played and won about 600 level 10 10x6s his rating would go up by one point. To gain another point he’d have to win about 1,000 more. Another 1,800 games gets him a third rating point. The average player would have lost 2 games at that point based on the 99.95% win rate for this variant, so this player would have showed he deserved those three hard-earned points by not losing. So yes it’s possible in theory, but there are diminishing returns to doing this, and only so many hours in a day.
What is the impact of losing a game that should have been rated higher?
It won't matter at all after a few days’ play. First of all there are as many underrated games as overrated, and you’re as likely to encounter one as the other. But regardless, a bigger than expected drop has no permanent impact. It doesn’t just get averaged away, it’s eventually erased. The player’s rating will go up more for wins and down less for losses until she ends up in the same place.
I won a hard game, shouldn’t I have gained more points?
Maybe. No game’s rating is exactly right. But remember that game stats can be deceiving. Many level 5 players are not especially skilled. Many games appeared in tournaments where there was no penalty for playing fast and loose. And finally remember the ratings know who you are, so they expect more from highly rated players. Consider it a compliment and trust that you’ll also play games that were overrated and on average things will work out.
I want to play a certain variant, but it seems underrated. What should I do?
Your call, but playing it is how we fix this. For aligning across variants and difficulty levels to work the model needs data, meaning someone needs to branch out and spend time playing variants and difficulty levels they normally wouldn’t. It may mean using the Custom option to reach lesser played levels as well. Doing this occasionally can also be a reality check on your rating, since it should come back up when you return to other variants.
What about unwinnable games?
There’s nothing to be gained from playing an unwinnable game, whether for your streak or your rating. In terms of impact, they actually matter a lot less for Elo ratings than for streaks. An unwinnable game resets your streak to zero; it might set your Elo rating back a few points. But if your rating takes a hit from an unwinnable game don’t sweat it, the model will give you more points for your next win and take fewer for your next loss until you're right back where you belong.

Those games with a high number of plays and no wins have already been assigned ratings near 3000 to minimize their impact. We used this number so as not to distort the averages too much since we're already near parity and averages are important for assigning new ratings. This will continue to be refined. Meanwhile it’s helpful to remember that every game’s rating is off to one degree or another, the unwinnables only stand out because we can tell when it’s off. The system is designed to work despite that.

Why is there such a wide range of ratings between level 5 and 12 of the same variant?
Generally speaking the level 5 game ratings started out moderately underrated, and the 10, 11, and 12 games were clearly overrated. The range for some of these has compressed to a difference of 200–300 points between level 5 and 12, especially in 8x4 (300-point difference between level 5 and 12) and the easier variants. Then it widens as variants get more difficult, then shrinks again when you get to the impossibly difficult ones. We'll learn more as the ratings get better over time, but at this moment it looks like difficulty level makes the biggest difference in 4x9, 4x10, 5x8, 6x5, 7x3, 8x3, and 10x1.
How can I see how my recent play has impacted my rating?
Hop created a great tool for this here:

Hop's Freecell Elo Calculator

Copy the data directly from your Recent Play and paste it here to see game by game Elo changes.
I see other games with identical stats to the one I played but with different ratings. What’s happening?
Now that ratings are up and running we’re no longer assigning ratings based on game stats, so comparing ratings based on game stats is not going to get you anywhere. Game ratings now adjust based on point exchanges between 0 and 8 points, and not on the ratings of games with similar records. The initial ratings are not the model, they're just a starting point. Game stats are not ratings.


All content copyright ©2021 Freecell.net
By using our games you consent to our minimal use of cookies to maintain basic state.
Maintained by Dennis Cronin