Performance Reports #40

mokemokechicken · 2018-01-28T03:34:32Z

Please share your reversi model achievement!
Models that are not reversi-alpha-zero are also welcome.
Battle record, configuration, repository url, comments and so on.

mokemokechicken · 2018-01-28T11:38:23Z

There are lots of information here.

gooooloo · 2018-01-28T15:43:54Z

I have some model reports in records.md of this repo

richemslie · 2018-01-30T23:05:16Z

I have started a recent run. This is using ggp-zero (reversi-alpha-zero implementation was inspiration!). ggp-zero is a generic implementation of a 'zero' method, and can train many different games. ie By zero I mean start with a random network and train via self play using (PUCT or variant) MCTS. However, at this point the implementation (and goals) are very divergent from AlphaZero (I also drew inspiration from 'Thinking Fast and Slow'). For this run, I am running with multiple policies and multiple value heads, with no turn flipping of the network and no symmetry/rotation of the network.

A previous run achieved approximately ntest level 7, however there were no records. Will keep up to date this time!

apollo-time · 2018-02-05T09:01:53Z

@gooooloo what's mean ntest:6 result (6/1/3) for step-418500?

gooooloo · 2018-02-05T10:13:06Z

@apollo-time in my report, ntest:6 means the opponent is NTest with strengh 6, 6/1/3 means 6 wins, 1 draw, 3 lose. By the way, you can find the win/draw game saves in https://github.com/gooooloo/reversi-alpha-zero-models/tree/master/ggf )

apollo-time · 2018-02-05T10:28:01Z

@gooooloo thanks reply. I see your model is very good. My model can't beat ntest depth 5 now.
How about policy and value loss? Mine is (0.15, 0.1) now.

gooooloo · 2018-02-05T10:33:43Z

@apollo-time

How about policy and value loss?

policy loss : 0.4
value loss : 0.4-0.55. Unstable.

apollo-time · 2018-02-05T10:39:06Z

@gooooloo Um...But you use game history, isn't it?

gooooloo · 2018-02-05T10:54:48Z

@apollo-time

Um...But you use game history, isn't it?

Yes. I am actually guessing that, the resnet blocks number in the model should be reduced depends on the historical board number: Less historical boards, shallower the model should be. There is bigger chance to overfit if network's input space is large. I sometimes observe strange, bad move from my model, right after several strong moves. That I cannot understand. Overfitting could be a explain, yet I cannot make sure.

richemslie · 2018-02-05T17:16:58Z

I finished up my latest run, ending up somewhere between ntest 5-10, depending on the phase of the moon. Not too shabby.
The policy loss was about 2.0 and value loss is 0.14 - I found the above numbers very interesting.
@gooooloo - congrats on creating such a strong model. Can you try ntest level 7 again with your latest model, I would love to see it getting beat? :)

gooooloo · 2018-02-06T02:20:29Z

@richemslie congrats on getting the result. ntest5 is already strong, I feel.

Can you try ntest level 7 again with your latest model

I have detailed evaluation metric here. The model beats ntest6 6/1/3 is step-418500+400sim, it loses to ntest7... But it beats ntest9 with 7/0/3. It seems to me that winning ntest:x doesn't imply winning ntext:(x-1) or ntest:(x-2)...

apollo-time · 2018-02-06T02:27:49Z

@gooooloo I see your player gametreee and see you subtract virtual loss after all simulations(800).
Why do you not use parallel_search_num and is it ok?

gooooloo · 2018-02-06T03:09:02Z

@apollo-time

subtract virtual loss after all simulations(800)

Do you mean I should subtract virtual loss every simulation, not until all sims finished? That's what I do inside backup(). My function naming is not accurate though, will refine them later...

Why do you not use parallel_search_num and is it ok?

I just change the implementation to single thread way 3 days ago. I think the efficiency is
not changed and codes are simpler to read, and I can also use it for time control (@mokemokechicken pointed it out the coroutine way can also do time control, as in this issue ).

Besides, I am using prediction_queue_size and its value is same with parallel_search_num in my config.

Yet my model was most trained from this version of GameTree implementation. You check it either.

apollo-time · 2018-02-06T03:40:31Z

@gooooloo Oh, I see you subtract virtual loss inside backup().
I am checking now why my model can't beat ntest and windows reversi though low policy and value loss.
I'll check your previous version. thanks.

gooooloo · 2018-02-07T14:52:29Z

Report: my step-473800 model with 30 minutes per game time control beats ntest lv7-10, draw lv11, beats lv12-13, lose to lv14( 2 draws, 2 loses). Game model can be found here, evaluation game savings can be found here

Why use 30 min? Actually I am targeting 5 min C++ implementation on a 8-core CPU, 1-GPU machine. But I don't have a C++ implementation yet, so I try current python implementation with 30 min. I believe they will be similar in simulation_number_per_move, which is about 13000. And I believe 5 minutes of C++ implementation is reasonable for evaluating AI's length.

gooooloo · 2018-02-09T16:47:45Z

Report: I am getting Evaluator result as below. Many draws since last generation model. That is interesting. In this case, even if I am lucky to get a new model later with 10 more wins than lose, I am hard to be convinced that is a better model; I will regard it as just some randomness.

I think maybe this is exactly the reason DeepMind doesn't use AlphaGoZero way for Chess and Shogi. Yes they mention that reason in the AlphaZero paper, but I think I am experiencing it now by myself ^^

Will change to AlphaZero way soon...

486600-steps: 189 wins, 9 draws, 202 loses, -208.0 elo, < threshold(150).
488200-steps: 191 wins, 7 draws, 202 loses, -176.0 elo, < threshold(150).
489000-steps: 195 wins, 4 draws, 201 loses, -96.0 elo, < threshold(150).
490600-steps: 192 wins, 9 draws, 199 loses, -112.0 elo, < threshold(150).
491400-steps: 192 wins, 7 draws, 201 loses, -144.0 elo, < threshold(150).
492200-steps: 191 wins, 3 draws, 206 loses, -240.0 elo, < threshold(150).
493800-steps: 195 wins, 8 draws, 197 loses, -32.0 elo, < threshold(150).
494600-steps: 190 wins, 4 draws, 206 loses, -256.0 elo, < threshold(150).
496200-steps: 193 wins, 4 draws, 203 loses, -160.0 elo, < threshold(150).
497000-steps: 212 wins, 3 draws, 185 loses, 432.0 elo, >= threshold(150).
497800-steps: 152 wins, 71 draws, 177 loses, -400.0 elo, < threshold(150).
499400-steps: 169 wins, 60 draws, 171 loses, -32.0 elo, < threshold(150).
500200-steps: 176 wins, 44 draws, 180 loses, -64.0 elo, < threshold(150).
501000-steps: 159 wins, 61 draws, 180 loses, -336.0 elo, < threshold(150).
502600-steps: 176 wins, 44 draws, 180 loses, -64.0 elo, < threshold(150).
503400-steps: 174 wins, 51 draws, 175 loses, -16.0 elo, < threshold(150).
505000-steps: 166 wins, 62 draws, 172 loses, -96.0 elo, < threshold(150).
505800-steps: 161 wins, 55 draws, 184 loses, -368.0 elo, < threshold(150).
506600-steps: 179 wins, 43 draws, 178 loses, 16.0 elo, < threshold(150).
508200-steps: 167 wins, 53 draws, 180 loses, -208.0 elo, < threshold(150).
509000-steps: 170 wins, 62 draws, 168 loses, 32.0 elo, < threshold(150).
509800-steps: 168 wins, 59 draws, 173 loses, -80.0 elo, < threshold(150).
513000-steps: 166 wins, 55 draws, 179 loses, -208.0 elo, < threshold(150).
513800-steps: 176 wins, 63 draws, 161 loses, 240.0 elo, >= threshold(150).
515400-steps: 87 wins, 213 draws, 100 loses, -208.0 elo, < threshold(150).
516200-steps: 97 wins, 202 draws, 101 loses, -64.0 elo, < threshold(150).
517800-steps: 91 wins, 193 draws, 116 loses, -400.0 elo, < threshold(150).
518600-steps: 91 wins, 212 draws, 97 loses, -96.0 elo, < threshold(150).
520200-steps: 89 wins, 216 draws, 95 loses, -96.0 elo, < threshold(150).

mokemokechicken · 2018-02-10T01:01:09Z

@gooooloo
That's interesting.
In my self play, there was a time when the draw was very much(more than 50%).
There was also a time when black's winning percentage was more than 50%, and vice versa.

I think one of the advantage of AlphaZero is to change models always.
However, one of the disadvantage is that it is easy to overfit the training data and to become weak.
So I feel that "training/self-play ratio" of #38 is very important.

mationai · 2018-02-12T23:21:26Z

Wondering which is currently the strongest model. Is Challenge 1 still the strongest? Also, sh ./download_model.sh 2 seems to save to data/model/model_best_*, but config.py (and in README) seems to expect it in /data/model/next_generation/*

richemslie · 2018-02-13T00:15:51Z

Hi - new record for gzero (of a different kind - playing equal to ntest level 3 after only 12 hours of training). Discovered a pretty bad bug with PUCT constants this morning, which is the reason for a new run.

Some points:

Used a tiny->small network (gen34 had 8 residual layers, 96 filters and 128 hidden before value head) - which is one reason could train so fast.
gen 35 was much weaker than 34, which is making me consider adding an evaluation stage before switching out network
I trained using 2 gpus (1080/1080ti) - and have 6k concurrent self-play games - I swap out the network mid game each new generation, and since gen35 was weaker than gen34 - this could potentially be harmful.
Only taking 6 samples per game.

AranKomat · 2018-02-13T04:19:01Z

@richemslie Your implementation using C++ looks very interesting. I'm curious how many self-play games per second or minute you can generate. By taking into account your small architecture, it may be possible to compare your speed with that of Python-only implementations.

richemslie · 2018-02-13T12:33:05Z

@AranKomat - it is hard to give exact numbers for comparison, but using a batch size of 1024 on 1080ti card, it can be 100% saturated with 2 c++ threads. I am seeing 20480 model evaluations, and hence for ~60 moves at 800 evaluations per move, that is 2.3 seconds per game. For the tiny (initial) network (8 res, 64 filters,64 hidden), it took 3 c++ threads to saturate and was about twice as fast (1.1 seconds).
The second card is throttled to prevent overheating it (adding a sleep into my optimised c++ code was painful!) - it roughly works out to 14 seconds per game.
Note that the reversi game is defined in a prolog like language, and then interpreted... so if it was a custom reversi implementation in c++ it would be much faster, and it wouldn't require so many threads to saturate the GPU.

AranKomat · 2018-02-13T13:52:59Z

@mokemokechicken achieved 22 seconds per game with 256 filters, a single 1080ti and half as many sims/move. I assume it would take 22s per game with two 1080ti's and 800 sims/move. Though arch FLOPS is (256/96)^2~7 times larger, GPU and TF scales weirdly. So, if I assume yours would take 2.3x3 s per game with the same architecture as moke's, the speedup of using C++ is probably 22/(2.3x3)~3.3 times? I'm looking forward to your updates!

mokemokechicken · 2018-02-13T23:52:39Z

@fuzzthink

I am sorry to have confused you.

Wondering which is currently the strongest model. Is Challenge 1 still the strongest?

Now, "challenge 5 model" and "ch5 config" are strongest in my models.

Also, sh ./download_model.sh 2 seems to save to data/model/model_best_, but config.py (and in README) seems to expect it in /data/model/next_generation/

Please remove( or rename) data/model/next_generation/ directory if you want to use "BestModel" at data/model/model_best_*.

For example,

rm -rf data/model/next_generation/
sh ./download_model.sh 5
# run as wxPython GUI
python src/reversi_zero/run.py play_gui --type ch5

If you want to use as a NBoard engine, please use nboard_engine --type ch5 for the Command.

mokemokechicken added the report label Jan 28, 2018

mokemokechicken mentioned this issue Jan 28, 2018

Baseline Comparison? #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Reports #40

Performance Reports #40

mokemokechicken commented Jan 28, 2018 •

edited

Loading

mokemokechicken commented Jan 28, 2018

gooooloo commented Jan 28, 2018 •

edited

Loading

richemslie commented Jan 30, 2018 •

edited

Loading

apollo-time commented Feb 5, 2018

gooooloo commented Feb 5, 2018 •

edited

Loading

apollo-time commented Feb 5, 2018

gooooloo commented Feb 5, 2018

apollo-time commented Feb 5, 2018

gooooloo commented Feb 5, 2018

richemslie commented Feb 5, 2018

gooooloo commented Feb 6, 2018

apollo-time commented Feb 6, 2018

gooooloo commented Feb 6, 2018 •

edited

Loading

apollo-time commented Feb 6, 2018

gooooloo commented Feb 7, 2018

gooooloo commented Feb 9, 2018

mokemokechicken commented Feb 10, 2018

mationai commented Feb 12, 2018

richemslie commented Feb 13, 2018

AranKomat commented Feb 13, 2018

richemslie commented Feb 13, 2018

AranKomat commented Feb 13, 2018 •

edited

Loading

mokemokechicken commented Feb 13, 2018

Performance Reports #40

Performance Reports #40

Comments

mokemokechicken commented Jan 28, 2018 • edited Loading

mokemokechicken commented Jan 28, 2018

gooooloo commented Jan 28, 2018 • edited Loading

richemslie commented Jan 30, 2018 • edited Loading

apollo-time commented Feb 5, 2018

gooooloo commented Feb 5, 2018 • edited Loading

apollo-time commented Feb 5, 2018

gooooloo commented Feb 5, 2018

apollo-time commented Feb 5, 2018

gooooloo commented Feb 5, 2018

richemslie commented Feb 5, 2018

gooooloo commented Feb 6, 2018

apollo-time commented Feb 6, 2018

gooooloo commented Feb 6, 2018 • edited Loading

apollo-time commented Feb 6, 2018

gooooloo commented Feb 7, 2018

gooooloo commented Feb 9, 2018

mokemokechicken commented Feb 10, 2018

mationai commented Feb 12, 2018

richemslie commented Feb 13, 2018

AranKomat commented Feb 13, 2018

richemslie commented Feb 13, 2018

AranKomat commented Feb 13, 2018 • edited Loading

mokemokechicken commented Feb 13, 2018

mokemokechicken commented Jan 28, 2018 •

edited

Loading

gooooloo commented Jan 28, 2018 •

edited

Loading

richemslie commented Jan 30, 2018 •

edited

Loading

gooooloo commented Feb 5, 2018 •

edited

Loading

gooooloo commented Feb 6, 2018 •

edited

Loading

AranKomat commented Feb 13, 2018 •

edited

Loading