Chapter12
∑ τ α π δ γ ∏ Δ Λ μ ∈ ∀ β λ ∞
References
Exercises
12.1
Gt:t+n = Rt+1 + γRt+2 + … + γnv(St+n, wt+n-1)
= Rt+1 + γ(Rt+1+1 + … + γn-1v(S(t+1)+(n-1), w(t+1)+(n-1)-1))
= Rt+1 + γGt+1:t+1+n-1
Gt:t+1 = Rt+1 + γv(St+1, wt)
Gt+1:t+1 = v(St+1, wt)
Gtλ = (1 - λ)∑n=(1,∞)λn-1Gt:t+n
= (1 - λ)∑n=(1,∞)λn-1(Rt+1 + γGt+1:t+1+n-1)
= (1 - λ)(∑n=(1,∞)λn-1Rt+1 + γλ1-1Gt+1:t+1 + λ∑n=(1,∞)γGt+1:t+1+n)
= Rt+1 + (1 - λ)γv(St+1, wt) + γλGt+1
12.2
(1 - λ)λτ = (1 - λ) / 2
λτ = 1 /2
τ = logλ(1/2)
12.3
-
wt+1 = wt + α[Gtλ - v(St,wt)]dv(St,wt)
-
δt = Rt+1 + γv(St+1,wt) - v(St,wt)
-
Gtλ = Rt+1 + (1 - λ)γv(St+1, wt) + γλGt+1
-
Gt - V(St) = ∑k=t:T-1γk-tδk
Gt - V(St) = Rt+1 + (1 - λ)γv(St+1, wt) + γλGt+1 - V(St) …(3)
= Rt+1 + γv(St+1,wt) - v(St,wt) - λγv(St+1,w) + λγGt+1λ
= δt + λγ(Gt+1λ - v(St+1,wt)) …(2)
= δt + λγ(δt + λγ(Gt+2λ - v(St+2,wt)))
= ∑k=t:∞(λγ)k-tδk
12.4
∑ τ α π δ γ ∏ Δ Λ μ ∈ ∀ β λ ∞
TD(λ):
Δw = α∑tδtzt
zt+1 = γλzt + dV(St,w)
zt+2 = γλzt+1 + dV(St+1,w)
= γλ(γλzt + dV(St,w)) + dV(St+1,w)
= γλ2zt + γλzt+1 + zt+2
zt+n = ∑k=0:n(γλ)n-kdV(St+k)
Δw = α∑tδt∑k=0:t(γλ)t-kdV(Sk)
Don’t know how to prove it, refer to LyWangPx
12.5
(1) Gt:h - Gt:h-1
= δt + Vt + γ(Gt+1:t+n - Vt+1) - δt - Vt - γ(Gt+1:t+n-1 - Vt+1)
= γ(Gt+1:t+n - Gt+1:t+n-1)
= …
= γn(Gt+n-1:t+n - Gt+n-1:t+n-1)
= γn(Rt+n + γVt+n - Vt)
= γhδt+h
(2) ∑n=1:h-t-1λn-1Gt:t+n - λ∑n=1:h-t-2λn-1Gt:t+n
= ∑n=1:h-t-2λn-1(Gt:t+n - λ*λn-1Gt:t+n) + λh-t-2Gt:t+h-t-1
= (1 - λ)∑n=1:h-t-2λn-1Gt:t+n + λh-t-2Gt:h-1
(3) Gt:hλ = (1 - λ)∑n=1:h-t-1λn-1Gt:t+n + λh-t-1Gt:h
= ∑n=1:h-t-1λn-1Gt:t+n - λ∑n=1:h-t-1λn-1Gt:t+n + λh-t-1Gt:h
= ∑n=1:h-t-1λn-1Gt:t+n - λ∑n=1:h-t-2λn-1Gt:t+n - λh-t-1Gt:h-1 + λh-t-1Gt:h
= (1 - λ)∑n=1:h-t-2λn-1Gt:t+n + λh-t-2Gt:h-1 + λh-t-1γhδt+h
= Solve it recursively
(4) Intuitively, Gt+1 - Gt = estimated δ decayed by γ and λ
12.6
Loop for i in F(S,A):
δ -= wi
zi += 1
...
Loop for i in F(S',A): δ += γwi
w += αδz
z = (1-α)γλz
S = S'; A = A'
12.6
Gtλs = (1-λt+1)Gt + γt+1λt+1Gt+1λs + λt+1Rt+1
Part I: λt+1Rt+1
Part II: (1-λt+1)Gt
Part III: γt+1λt+1Gt+1λs
Gtλs = Rt+1 + γt+1((1-λt+1)v(St+1,wt) + λt+1Gt+1λs)
Gt:hλs = Rt+1 + γt+1((1-λt+1)v(St+1,wt) + λt+1Gλst+1:h)
Gt:hλa = Rt+1 + γt+1((1-λt+1)q(St+1,At+1,wt) + λt+1Gλat+1:h)
Gt:hλs = Rt+1 + γt+1((1-λt+1)vexp(St+1) + λt+1Gλst+1:h)