Skip to the content.

Supervised Learning for RL NN

For further RL of Tenhou client, this model was trained by game played by Tenhou experts.

Data

Source

The data was downloaded from 天鳳位. The files in broken xml format have been transformed into xml files and located in xml files

Input State

The input is a 1D int vector with length as 74:

Output Action

1D int vector with length as 42:

1D vector of state may not be a good option as it is hard to detect connection between my tiles and discarded tiles

Architecture

The architecture of the NN is basically like:

Layer Activation
CNN RELU
Dense RELU
LSTM TANH
Output Softmax

And several options tried.

Training

The following figure shows the summary:

Factor Value
Learning rate 0.01
Kernel size 4
Kernel number 8
#Dense Node 128
Normalized Masked CNN Dense LSTM Epoch batchsize Eval Acc Eval F1 Test Acc Test F1
0 ~ 1 N 2 1 Y 8 1 0.4079 0.5032 0.3559 0.0818
0 ~ 256 N 2 1 Y 8 1 0.4248 0.5382 0.6316 0.4611
0 ~ 7 N 2 1 Y 8 1 0.6937 0.5489 0.6943 0.5326
0 ~ 7 N 0 2 Y 8 1 0.03 0.05 NA NA
0 ~ 7 N 2 0 Y 8 1 0.4009 0.4592 NA NA
0 ~ 256 Y 2 1 Y 8 64 0.5145 0.4722 NA NA
0 ~ 1 Y 2 1 Y 8 64 0.5927 0.5659 NA NA
0 ~ 1 Y 2 1 Y 16 64 0.6531 0.6016 0.6485 0.6002

Batch size = 1

Batch size = 64

Error

Once the bi-directory lstm was tested and got a evaluation accuracy > 99%. While the reason is that difference between current state and next state is exactly the action taken. So the training is of nonsense, and test result is bad.