BASE／IQL_S3 v20221003

Model

Value Network `V`(`s`)

Encoder

Type: Transformer encoder layers (the same network structure as the one used for BERT_BASE)
- Dimension: 768
- # of heads: 12
- Dimension of feedforward networks: 3072
- # of layers: 12
- Activation function: GELU
- Dropout rate in training: 0.1
- Initialization: Random

Decoder

Type: Single-layer position-wise feedforward network
- Dimension: 3072
- Activation function: GELU
- Dropout rate in training: 0.1
- Initialization: Random

Q Network `Q`(`s`, `a`)

Encoder

Type: Transformer encoder layers (the same network structure as the one used for BERT_BASE)
- Dimension: 768
- # of heads: 12
- Dimension of feedforward networks: 3072
- # of layers: 12
- Activation function: GELU
- Dropout rate in training: 0.1
- Initialization: Random

Decoder

Type: Dueling network with two single-layer position-wise feedforward networks
- Dimension: 3072
- Activation function: GELU
- Dropout rate in training: 0.1
- Initialization: Random

Objective

Type: Implicit Q-learning (IQL)
- Reward: Game delta of grading points as a Saint 3 player in the Jade room

Data

Crawled Game Records

Crawled Game Records v202007_202109

Training Examples

110000000 samples randomly sampled from the crawled game records and shuffled.

(Since there is a large jump in the value of one of the Q loss functions after 120000000 samples, the point where the loss function has the lowest value is taken before that point.)

Optimization

Implicit Q-learning (IQL)

Discount factor (γ): 1.0
Expectile (τ): 0.90
Soft update (Polyak averaging) rate of target networks (α): 0.1
Optimizer: LAMB
Learning rate: 0.001
ε: 1.0e-6
Batch size: 65536
# of training epochs: N/A

Graphs of Loss Functions and Their Gradient Norms

Graph of Gradient Norm of Q Losses

q-gradient-norm

Graph of Q Losses

q-loss

Graph of Gradient Norm of V Loss

value-gradient-norm

Graph of V Loss

value-loss

Advantage Weighted Regression (AWR)

(TODO)

Quantitative Comparison with BASE／BC_H13 v20220210 as the Baseline

(TODO)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BASE／IQL_S3 v20221003

Model

Value Network `V`(`s`)

Encoder

Decoder

Q Network `Q`(`s`, `a`)

Encoder

Decoder

Objective

Data

Crawled Game Records

Training Examples

Optimization

Implicit Q-learning (IQL)

Graphs of Loss Functions and Their Gradient Norms

Graph of Gradient Norm of Q Losses

Graph of Q Losses

Graph of Gradient Norm of V Loss

Graph of V Loss

Advantage Weighted Regression (AWR)

Quantitative Comparison with BASE／BC_H13 v20220210 as the Baseline

Clone this wiki locally

BASE／IQL_S3 v20221003

Model

Value Network V(s)

Encoder

Decoder

Q Network Q(s, a)

Encoder

Decoder

Objective

Data

Crawled Game Records

Training Examples

Optimization

Implicit Q-learning (IQL)

Graphs of Loss Functions and Their Gradient Norms

Graph of Gradient Norm of Q Losses

Graph of Q Losses

Graph of Gradient Norm of V Loss

Graph of V Loss

Advantage Weighted Regression (AWR)

Quantitative Comparison with BASE／BC_H13 v20220210 as the Baseline

Clone this wiki locally

Value Network `V`(`s`)

Q Network `Q`(`s`, `a`)