Skip to the content.

Chapter08

Notes

8.1

Total step = 8000, change model at step = 4000; red = dyna, green = dyna+

Move The Block

move the block

Shortcut

sc_block

8.5

8.5.1

It requires another table for P(S | S, A). And Q(S,A) updated by maxa(∑s’’p(s’’|s,a) * Q(s, a))

8.5.2

As learning from real world and learning from model are separated, the speed of learning from model may lay back the changing of real world.

8.5.3

Adjust the ratio of environment learning and model learning

8.6

It would strengthen for sample updates over expected updates as

Not explicit test result:

value