why it gets mean score by 21 when training but gets only 6.2 when testing........ #7

ouyangzhuzhu · 2018-05-31T01:50:22Z

I train the agent with A3C, hi~, is there any differences between the training and testing sets？ The agent got mean score by 21 but got only 6 when testing... Is there any differences between the testing method when training and testing?

Here is the training result at epoch 30:

EPOCH 30
TRAIN: 15360164(GlobalSteps), 110 episodes, mean: 21.682±1.97, min: 11.000, max: 25.000,
LocalSpd: 282 STEPS/s GlobalSpd: 524 STEPS/s, 1.89M STEPS/hour, total elapsed time: 8h 8m 44s TEST: mean: 23.040±1.50, min: 18.000, max: 25.000, test time: 1m 10s
Learning rate: 2.38661577896e-05
Saving model to: models/defend_center/example_a3c_center/ACLstmNet/05.30_14-42/model
Time: 22:52

Here is the testing result for 10 episodes:

d3alg@ubuntu-59:/home/lab/wsy/deep_rl_vizdoom$ CUDA_VISIBLE_DEVICES=4 ./test_a3c.py models/defend_center/example_a3c_center/ACLstmNet/05.30_14-42 -e 10 --seed 123
Mean score: 6.200

Miffyli · 2018-05-31T08:55:38Z

Judging by this, testing is done with deterministic policy by default. I.e. instead of sampling action from the policy's distribution, agent picks the action with highest probability.

When it comes to evaluating the performance of agent during training, people often just print and report the performance during training (the printout you see during training). Reinforcement learning in funny in a sense that your test set is your training set (unless you want to evaluate something specific).

ouyangzhuzhu · 2018-05-31T09:19:05Z

3ks for your explanation , and here u mention the deterministic policy by default when testing ,where does the deterministic policy comes from? Does it come from the well trained model?

Miffyli · 2018-05-31T10:50:58Z

The policy comes from the trained model yes. In this case "the policy" (i.e. our agent) outputs probability for each action given a state, and these probabilities correspond to what agent has learned (higher probability = better action).

You can deal with these probabilities at least in two ways:

You pick random action according to probabilities provided by the policy (higher probability = higher chance of picking that action). This is non-deterministic, since you won't always pick same action for same state. Or,
Pick action which has highest probability. This is deterministic, since you always pick the same action for same state.

The second one might first sound like a better approach: We want to pick best actions in each state, after all. However there are publications and toy examples showing that stochasticity can improve the performance (or is required for high reward). From my personal experiences sampling an action with A3C in ViZDoom is better.

ouyangzhuzhu · 2018-06-06T07:26:59Z

3ks a lot Miffyli for your answers!!!!!

ouyangzhuzhu · 2018-06-20T08:50:49Z

today i use --hide_window when test :d3alg@ubuntu-59:/home/lab/wsy/deep_rl_vizdoom$ CUDA_VISIBLE_DEVICES=4 ./test_a3c.py models/defend_center/example_a3c_center/ACLstmNet/05.30_14-42 -e 10 --hide-window --seed 123
the score is Mean score: 21.45!!

why........when training the --hide-window is default to be true, but when testing with command, the --hide-window is default to be false that's why we can see the screen when testing but not when training
.....but how can this setting cause the score gap,,,,,,? can not figure it out....

mihahauke · 2018-06-20T11:02:46Z

Perhaps there is some bug in my code or the new version or vizdoom. I will investigate it later. Sorry for late response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why it gets mean score by 21 when training but gets only 6.2 when testing........ #7

why it gets mean score by 21 when training but gets only 6.2 when testing........ #7

ouyangzhuzhu commented May 31, 2018 •

edited

Loading

Miffyli commented May 31, 2018 •

edited

Loading

ouyangzhuzhu commented May 31, 2018

Miffyli commented May 31, 2018

ouyangzhuzhu commented Jun 6, 2018

ouyangzhuzhu commented Jun 20, 2018

mihahauke commented Jun 20, 2018

why it gets mean score by 21 when training but gets only 6.2 when testing........ #7

why it gets mean score by 21 when training but gets only 6.2 when testing........ #7

Comments

ouyangzhuzhu commented May 31, 2018 • edited Loading

Miffyli commented May 31, 2018 • edited Loading

ouyangzhuzhu commented May 31, 2018

Miffyli commented May 31, 2018

ouyangzhuzhu commented Jun 6, 2018

ouyangzhuzhu commented Jun 20, 2018

mihahauke commented Jun 20, 2018

ouyangzhuzhu commented May 31, 2018 •

edited

Loading

Miffyli commented May 31, 2018 •

edited

Loading