Large-scale Analysis of Chess Games

5 millions of chess games (300+ million of chess positions) have been recorded from the very beginning of chess history to the last tournaments of Magnus Carlsen. It's time to analyse all of them!

This repository contains different resources (e.g., Java code) to analyse chess games. Current status:

We are writing a technical report on various statistics of the chessgame database (e.g., number of unique positions);
We are using Igrida Cluster for large computations with Stockfish UCI Engine

Do not hesitate to participate or contact us!

Objectives

We hope to gather various interesting insights on the skills, ratings, or styles of (famous) chess players. In fact numerous applications can be and have been considered such as cheat detection, computation of an intrinsic, "universal" rating, or the determination of key moments chess players blunder. For instance we would like to answer a question like "Who are the best chess players in history?"

For doing so, you typically need to analyze millions of moves with chess engines; it requires lots of computations. Our goal is to propose an open infrastructure for large-scale analysis of chess games. Specifically, we aim to:

replicate state-of-the-art research results (e.g., on cheat detection or intrinsic ratings);
provide open data and procedures for exploring new directions;
investigate software engineering/scalability issues when computing millions of moves;
organize a community of potential contributors for fun and profit.

Static analysis

We are essentially analysing headers information (related to players' ratings, dates, openings, etc.). We qualify the analysis as "static" (as opposed to "dynamic", see below) since we do not analyse moves with chess engines. Until now we have:

parsed various PGN files and structure each game in a (relational) database
processed games with Spark SQL in order to generate CSV files together with R scripts to compute statistics

We are writing a technical report on various statistics of the database (e.g., number of unique positions)

Chess games are saved in PGN file, for example

[Event "FIDE Candidates 2014"]
[Site "Khanty-Mansiysk RUS"]
[Date "2014.03.25"]
[Round "10.1"]
[White "Karjakin,Sergey"]
[Black "Andreikin,D"]
[Result "1/2-1/2"]
[WhiteTitle "GM"]
[BlackTitle "GM"]
[WhiteElo "2766"]
[BlackElo "2709"]
[ECO "B46"]
[Opening "Sicilian"]
[Variation "Taimanov variation"]

1. e4 c5 2. Nf3 e6 3. d4 cxd4 4. Nxd4 Nc6 5. Nc3 a6 6. Nxc6 bxc6 7. Qd3 Qc7 8. Qg3 Qxg3 9. hxg3 d5 10. g4 Rb8 11. g5 f6 12. gxf6 Nxf6 13. e5 Nd7 14. f4 Nc5 15. Rh3 a5 16. b3 Ba6 17. Bxa6 Nxa6 18. Na4 Rb4 19. Bd2 Re4+ 20. Kf1 Bb4 21. c3 Ba3
22. Re1 Rxe1+ 23. Kxe1 O-O 24. Ke2 h6 25. Rg3 Kf7 26. Rh3 Kg6 27. Rg3+ Kf7 28. Rh3 Kg6 29. Rg3+ Kf7 1/2-1/2

As you can notice, each PGN file is separated in two parts: (1) headers (2) moves

The static analysis first parses each file and inspects each interesting headers information like White Elo Rating, Result or Date. We can also get an iterator on moves to get information about Pieces Captured Count or Pieces Moves Count... We parse all games with a Java parser in order to analyze different moves (promotions, king castling, captured pieces...) and generate FENs.

Dynamic analysis

We will analyse all generated FEN on Igrida Cluster with Stockfish UCI Engine (e.g., depth 20 with multi-pv 1). All logs are saved in files and afterwards in a (relational) database.

rnbqkbnr/pppppppp/8/8/8/5N2/PPPPPPPP/RNBQKB1R b KQkq - 1 1
info depth 1 seldepth 1 multipv 1 score cp 783 nodes 47 nps 47000 time 1 pv e4b4.
info depth 2 seldepth 2 multipv 1 score cp 807 nodes 119 nps 119000 time 1 pv e4b4 e6e7 b4b2 c2c1.
info depth 3 seldepth 4 multipv 1 score cp 813 nodes 213 nps 106500 time 2 pv e4b4 e6e7 b4b2 c2c1 f4d3 c1d1.
info depth 4 seldepth 6 multipv 1 score cp 813 nodes 351 nps 175500 time 2 pv e4b4 e6e7 b4b2 c2c1 f4d3 c1d1.
info depth 5 seldepth 7 multipv 1 score cp 905 nodes 711 nps 355500 time 2 pv e4b4 e6e7 b4b2 c2c1 f4d3 c1d1 a3a2.
info depth 6 seldepth 9 multipv 1 score cp 1056 nodes 2120 nps 706666 time 3 pv e4b4 e6e7 b4b2 c2c3 f4d5 c3d3 d5e7 h4h5.
...

For instance, Stockfish can calculate a score and evaluation of each move:

Ply	Move	White	Black	Score	Eval	Comment
0	[1]	e4		0.26	0.26
1	[1]		c6	0.4	-0.14
2	[2]	d4		0.34	-0.06
3	[2]		d5	0.31	0.03
4	[3]	Nc3		0.41	0.1
5	[3]		dxe4	0.3	0.11
6	[4]	Nxe4		0.31	0.01
7	[4]		Bf5	0.37	-0.06
8	[5]	Ng3		0.32	-0.05
...	...	...	...	...	...
69	[35]		Ne5	0.0	-0.03
70	[36]	Kd2		-0.06	-0.06
71	[36]		Nd7	-0.16	0.1
72	[37]	Ke3		-2.6	-2.44	Erreur
73	[37]		c5	-2.91	0.31

We can easily generates the evolution score of the game, in fact :

Key numbers

Because it's not possible to make a lot of reservation on Igrida, I run 3 sort of jobs :

Small job during the day (9h-17h)
Medium job during the night (17h-5h)
Long job during the week and week-end

Name	Number of jobs	Duration	Number of FEN to analyse per job	Total of FEN to analyse	Total worktime
Short Job	10 000	8h	6 000	60 000 000	80 000h
Medium Job	9 000	12h	10 000	90 000 000	108 000h
Long Job	4 000	64h	30 000	120 000 000	256 000h
Total	23 000	-		270 000 000	444 000h

We can see that it takes 444k hours to analyse our 270 millions FEN. To give you an idea, that represents 18 500 days (50 years) on a unique machine.

Our 270 millions FEN takes only 15.5 Go memory before analysis. Yet, each analysis has 3 ko logs (~3000 characters per log). So, we expect about 800 Go logs.

In average, Stockfish calculates 7 millions combinaisons at depth 20. So, we can think Igrida has, at total, calculates more than 2e15 nodes (~2 000 000 000 000 000).

Contact

This project is part of a research internship at Inria/IRISA laboratory DiverSE team on chess analysis:

François Esnault (MSc Student, University of Rennes 1) is the main developer and contributor of the project.
Mathieu Acher (Associate Professor, University of Rennes 1) is supervising the project.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.settings		.settings
lib		lib
resources		resources
src/main/java		src/main/java
.classpath		.classpath
.gitignore		.gitignore
.project		.project
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large-scale Analysis of Chess Games

Objectives

Static analysis

Dynamic analysis

Key numbers

Contact

About

Releases

Packages

Contributors 2

Languages

ChessAnalysis/chess-analysis

Folders and files

Latest commit

History

Repository files navigation

Large-scale Analysis of Chess Games

Objectives

Static analysis

Dynamic analysis

Key numbers

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages