Elo ratings calculation #1

ch3cksout · 2023-10-21T04:45:02Z

Using the advanced Elo calculations implemented in the ordo program,
and using the provided nominal Stockfish strengths as anchors,
I determined the following ratings from the games in this repo (omitting the isolated text-davinci-003 group):

   # PLAYER                    :  RATING  POINTS  PLAYED   (%)
   1 Stockfish_Elo2035         :  2035.0    16.0      20    80
   2 Stockfish_Elo1954         :  1954.0    16.5      19    87
   3 Stockfish_Elo1871         :  1871.0   187.5     252    74
   4 Stockfish_Elo1785         :  1785.0   104.0     153    68
   5 Stockfish_Elo1694         :  1694.0    20.5      40    51
   6 gpt-3.5-turbo-instruct    :  1682.9   193.0     485    40
   7 Stockfish_Elo1597         :  1597.0     8.5      22    39
   8 gpt-3.5-turbo             :  1401.2     6.0       8    75
   9 gpt-4                     :  1377.5    63.0     123    51
  10 RANDOM chess engine       :   760.6     1.0     110     1

Note that accurately determining Elo for chess engine is a non-trivial task, as has long been recognized in the computer chess community. In particular, a simplistic tournament performance (FIDE lookup table) approach is not a good one.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elo ratings calculation #1

Elo ratings calculation #1

ch3cksout commented Oct 21, 2023

Elo ratings calculation #1

Elo ratings calculation #1

Comments

ch3cksout commented Oct 21, 2023