-
Notifications
You must be signed in to change notification settings - Fork 0
/
output.txt
65 lines (45 loc) · 1.89 KB
/
output.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
Sending build context to Docker daemon 190kB
Step 1/6 : FROM python:3.8.5
---> 62aa40094bb1
Step 2/6 : RUN pip install numpy==1.19
---> Using cache
---> 2bdb87e7497a
Step 3/6 : RUN pip install gym==0.17
---> Using cache
---> cab2ee18195e
Step 4/6 : COPY ./*.py /usr/src/
---> 3e1d23334423
Step 5/6 : WORKDIR /usr/src
---> Running in 8216bc11f174
Removing intermediate container 8216bc11f174
---> 722ac26f28e8
Step 6/6 : CMD ["python", "main.py"]
---> Running in f010825a5388
Removing intermediate container f010825a5388
---> a0023a0c9065
Successfully built a0023a0c9065
Successfully tagged taxi:latest
Begin Taxi-v3 learning...
Parameters:
path_memory_decay=0.7, Q_weight=0.02, recency_weight=3, eps_start=-0.005, eps_decay=5e-05
learning_rate=0.7, discount_rate=0.5, min_stochasticity=0
Episode 10000/150000 || epsilon=0.6035056, Best average reward -87.38
Episode 20000/150000 || epsilon=0.3660446, Best average reward -18.69
Episode 30000/150000 || epsilon=0.2220173, Best average reward -3.92
Episode 40000/150000 || epsilon=0.1346603, Best average reward 1.88
Episode 50000/150000 || epsilon=0.0816756, Best average reward 5.42
Episode 60000/150000 || epsilon=0.0495388, Best average reward 6.34
Episode 70000/150000 || epsilon=0.0300468, Best average reward 7.82
Episode 80000/150000 || epsilon=0.0182243, Best average reward 8.15
Episode 90000/150000 || epsilon=0.0110536, Best average reward 8.29
Episode 100000/150000 || epsilon=0.0067043, Best average reward 8.34
Episode 110000/150000 || epsilon=0.0040664, Best average reward 8.72
Episode 120000/150000 || epsilon=0.0024664, Best average reward 8.72
Episode 130000/150000 || epsilon=0.0014959, Best average reward 8.72
Episode 140000/150000 || epsilon=0.0009073, Best average reward 9.26
Episode 150000/150000 || epsilon=0.0005503, Best average reward 9.26
Run 3/1, average so far=9.26
Local seed: 80
Average: 9.26
Median: 9.26
[9.26]