-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why it gets mean score by 21 when training but gets only 6.2 when testing........ #7
Comments
Judging by this, testing is done with deterministic policy by default. I.e. instead of sampling action from the policy's distribution, agent picks the action with highest probability. When it comes to evaluating the performance of agent during training, people often just print and report the performance during training (the printout you see during training). Reinforcement learning in funny in a sense that your test set is your training set (unless you want to evaluate something specific). |
3ks for your explanation , and here u mention the deterministic policy by default when testing ,where does the deterministic policy comes from? Does it come from the well trained model? |
The policy comes from the trained model yes. In this case "the policy" (i.e. our agent) outputs probability for each action given a state, and these probabilities correspond to what agent has learned (higher probability = better action). You can deal with these probabilities at least in two ways:
The second one might first sound like a better approach: We want to pick best actions in each state, after all. However there are publications and toy examples showing that stochasticity can improve the performance (or is required for high reward). From my personal experiences sampling an action with A3C in ViZDoom is better. |
3ks a lot Miffyli for your answers!!!!! |
today i use --hide_window when test :d3alg@ubuntu-59:/home/lab/wsy/deep_rl_vizdoom$ CUDA_VISIBLE_DEVICES=4 ./test_a3c.py models/defend_center/example_a3c_center/ACLstmNet/05.30_14-42 -e 10 --hide-window --seed 123 why........when training the --hide-window is default to be true, but when testing with command, the --hide-window is default to be false that's why we can see the screen when testing but not when training |
Perhaps there is some bug in my code or the new version or vizdoom. I will investigate it later. Sorry for late response. |
I train the agent with A3C, hi~, is there any differences between the training and testing sets? The agent got mean score by 21 but got only 6 when testing... Is there any differences between the testing method when training and testing?
Here is the training result at epoch 30:
EPOCH 30
TRAIN: 15360164(GlobalSteps), 110 episodes, mean: 21.682±1.97, min: 11.000, max: 25.000,
LocalSpd: 282 STEPS/s GlobalSpd: 524 STEPS/s, 1.89M STEPS/hour, total elapsed time: 8h 8m 44s TEST: mean: 23.040±1.50, min: 18.000, max: 25.000, test time: 1m 10s
Learning rate: 2.38661577896e-05
Saving model to: models/defend_center/example_a3c_center/ACLstmNet/05.30_14-42/model
Time: 22:52
Here is the testing result for 10 episodes:
d3alg@ubuntu-59:/home/lab/wsy/deep_rl_vizdoom$ CUDA_VISIBLE_DEVICES=4 ./test_a3c.py models/defend_center/example_a3c_center/ACLstmNet/05.30_14-42 -e 10 --seed 123
Mean score: 6.200
The text was updated successfully, but these errors were encountered: