Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why it gets mean score by 21 when training but gets only 6.2 when testing........ #7

Open
ouyangzhuzhu opened this issue May 31, 2018 · 6 comments

Comments

@ouyangzhuzhu
Copy link

ouyangzhuzhu commented May 31, 2018

I train the agent with A3C, hi~, is there any differences between the training and testing sets? The agent got mean score by 21 but got only 6 when testing... Is there any differences between the testing method when training and testing?

Here is the training result at epoch 30:

EPOCH 30
TRAIN: 15360164(GlobalSteps), 110 episodes, mean: 21.682±1.97, min: 11.000, max: 25.000,
LocalSpd: 282 STEPS/s GlobalSpd: 524 STEPS/s, 1.89M STEPS/hour, total elapsed time: 8h 8m 44s TEST: mean: 23.040±1.50, min: 18.000, max: 25.000, test time: 1m 10s
Learning rate: 2.38661577896e-05
Saving model to: models/defend_center/example_a3c_center/ACLstmNet/05.30_14-42/model
Time: 22:52

Here is the testing result for 10 episodes:

d3alg@ubuntu-59:/home/lab/wsy/deep_rl_vizdoom$ CUDA_VISIBLE_DEVICES=4 ./test_a3c.py models/defend_center/example_a3c_center/ACLstmNet/05.30_14-42 -e 10 --seed 123
Mean score: 6.200

@Miffyli
Copy link
Collaborator

Miffyli commented May 31, 2018

Judging by this, testing is done with deterministic policy by default. I.e. instead of sampling action from the policy's distribution, agent picks the action with highest probability.

When it comes to evaluating the performance of agent during training, people often just print and report the performance during training (the printout you see during training). Reinforcement learning in funny in a sense that your test set is your training set (unless you want to evaluate something specific).

@ouyangzhuzhu
Copy link
Author

3ks for your explanation , and here u mention the deterministic policy by default when testing ,where does the deterministic policy comes from? Does it come from the well trained model?

@Miffyli
Copy link
Collaborator

Miffyli commented May 31, 2018

The policy comes from the trained model yes. In this case "the policy" (i.e. our agent) outputs probability for each action given a state, and these probabilities correspond to what agent has learned (higher probability = better action).

You can deal with these probabilities at least in two ways:

  1. You pick random action according to probabilities provided by the policy (higher probability = higher chance of picking that action). This is non-deterministic, since you won't always pick same action for same state. Or,
  2. Pick action which has highest probability. This is deterministic, since you always pick the same action for same state.

The second one might first sound like a better approach: We want to pick best actions in each state, after all. However there are publications and toy examples showing that stochasticity can improve the performance (or is required for high reward). From my personal experiences sampling an action with A3C in ViZDoom is better.

@ouyangzhuzhu
Copy link
Author

3ks a lot Miffyli for your answers!!!!!

@ouyangzhuzhu
Copy link
Author

today i use --hide_window when test :d3alg@ubuntu-59:/home/lab/wsy/deep_rl_vizdoom$ CUDA_VISIBLE_DEVICES=4 ./test_a3c.py models/defend_center/example_a3c_center/ACLstmNet/05.30_14-42 -e 10 --hide-window --seed 123
the score is Mean score: 21.45!!

why........when training the --hide-window is default to be true, but when testing with command, the --hide-window is default to be false that's why we can see the screen when testing but not when training
.....but how can this setting cause the score gap,,,,,,? can not figure it out....

@mihahauke
Copy link
Owner

Perhaps there is some bug in my code or the new version or vizdoom. I will investigate it later. Sorry for late response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants