-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Reports #40
Comments
There are lots of information here. |
I have some model reports in |
I have started a recent run. This is using ggp-zero (reversi-alpha-zero implementation was inspiration!). ggp-zero is a generic implementation of a 'zero' method, and can train many different games. ie By zero I mean start with a random network and train via self play using (PUCT or variant) MCTS. However, at this point the implementation (and goals) are very divergent from AlphaZero (I also drew inspiration from 'Thinking Fast and Slow'). For this run, I am running with multiple policies and multiple value heads, with no turn flipping of the network and no symmetry/rotation of the network. A previous run achieved approximately ntest level 7, however there were no records. Will keep up to date this time! |
@gooooloo what's mean ntest:6 result (6/1/3) for step-418500? |
@apollo-time in my report, ntest:6 means the opponent is NTest with strengh 6, 6/1/3 means 6 wins, 1 draw, 3 lose. By the way, you can find the win/draw game saves in https://github.com/gooooloo/reversi-alpha-zero-models/tree/master/ggf ) |
@gooooloo thanks reply. I see your model is very good. My model can't beat ntest depth 5 now. |
policy loss : 0.4 |
@gooooloo Um...But you use game history, isn't it? |
Yes. I am actually guessing that, the resnet blocks number in the model should be reduced depends on the historical board number: Less historical boards, shallower the model should be. There is bigger chance to overfit if network's input space is large. I sometimes observe strange, bad move from my model, right after several strong moves. That I cannot understand. Overfitting could be a explain, yet I cannot make sure. |
I finished up my latest run, ending up somewhere between ntest 5-10, depending on the phase of the moon. Not too shabby. |
@richemslie congrats on getting the result. ntest5 is already strong, I feel.
I have detailed evaluation metric here. The model beats ntest6 6/1/3 is step-418500+400sim, it loses to ntest7... But it beats ntest9 with 7/0/3. It seems to me that winning ntest:x doesn't imply winning ntext:(x-1) or ntest:(x-2)... |
@gooooloo I see your player gametreee and see you subtract virtual loss after all simulations(800). |
Do you mean I should subtract virtual loss every simulation, not until all sims finished? That's what I do inside backup(). My function naming is not accurate though, will refine them later...
I just change the implementation to single thread way 3 days ago. I think the efficiency is Besides, I am using Yet my model was most trained from this version of GameTree implementation. You check it either. |
@gooooloo Oh, I see you subtract virtual loss inside backup(). |
Report: my step-473800 model with 30 minutes per game time control beats ntest lv7-10, draw lv11, beats lv12-13, lose to lv14( 2 draws, 2 loses). Game model can be found here, evaluation game savings can be found here Why use 30 min? Actually I am targeting 5 min C++ implementation on a 8-core CPU, 1-GPU machine. But I don't have a C++ implementation yet, so I try current python implementation with 30 min. I believe they will be similar in simulation_number_per_move, which is about 13000. And I believe 5 minutes of C++ implementation is reasonable for evaluating AI's length. |
Report: I am getting Evaluator result as below. Many draws since last generation model. That is interesting. In this case, even if I am lucky to get a new model later with 10 more wins than lose, I am hard to be convinced that is a better model; I will regard it as just some randomness. I think maybe this is exactly the reason DeepMind doesn't use AlphaGoZero way for Chess and Shogi. Yes they mention that reason in the AlphaZero paper, but I think I am experiencing it now by myself ^^ Will change to AlphaZero way soon...
|
@gooooloo I think one of the advantage of AlphaZero is to change models always. |
Wondering which is currently the strongest model. Is Challenge 1 still the strongest? Also, |
Hi - new record for gzero (of a different kind - playing equal to ntest level 3 after only 12 hours of training). Discovered a pretty bad bug with PUCT constants this morning, which is the reason for a new run. Some points:
|
@richemslie Your implementation using C++ looks very interesting. I'm curious how many self-play games per second or minute you can generate. By taking into account your small architecture, it may be possible to compare your speed with that of Python-only implementations. |
@AranKomat - it is hard to give exact numbers for comparison, but using a batch size of 1024 on 1080ti card, it can be 100% saturated with 2 c++ threads. I am seeing 20480 model evaluations, and hence for ~60 moves at 800 evaluations per move, that is 2.3 seconds per game. For the tiny (initial) network (8 res, 64 filters,64 hidden), it took 3 c++ threads to saturate and was about twice as fast (1.1 seconds). |
@mokemokechicken achieved 22 seconds per game with 256 filters, a single 1080ti and half as many sims/move. I assume it would take 22s per game with two 1080ti's and 800 sims/move. Though arch FLOPS is (256/96)^2~7 times larger, GPU and TF scales weirdly. So, if I assume yours would take 2.3x3 s per game with the same architecture as moke's, the speedup of using C++ is probably 22/(2.3x3)~3.3 times? I'm looking forward to your updates! |
@fuzzthink I am sorry to have confused you.
Now, "challenge 5 model" and "ch5 config" are strongest in my models.
Please remove( or rename) For example, rm -rf data/model/next_generation/
sh ./download_model.sh 5
# run as wxPython GUI
python src/reversi_zero/run.py play_gui --type ch5 If you want to use as a NBoard engine, please use |
Please share your reversi model achievement!
Models that are not reversi-alpha-zero are also welcome.
Battle record, configuration, repository url, comments and so on.
The text was updated successfully, but these errors were encountered: