Skip to the content.

Chapter12

∑ τ α π δ γ ∏ Δ Λ μ ∈ ∀ β λ ∞

References

towardsdatascience

Alister Reis’s blog

MIT ppt

Maybe Coursera

True TD Lambda

Exercises

12.1

Gt:t+n = Rt+1 + γRt+2 + … + γnv(St+n, wt+n-1)

= Rt+1 + γ(Rt+1+1 + … + γn-1v(S(t+1)+(n-1), w(t+1)+(n-1)-1))

= Rt+1 + γGt+1:t+1+n-1

Gt:t+1 = Rt+1 + γv(St+1, wt)

Gt+1:t+1 = v(St+1, wt)

Gtλ = (1 - λ)∑n=(1,∞)λn-1Gt:t+n

= (1 - λ)∑n=(1,∞)λn-1(Rt+1 + γGt+1:t+1+n-1)

= (1 - λ)(∑n=(1,∞)λn-1Rt+1 + γλ1-1Gt+1:t+1 + λ∑n=(1,∞)γGt+1:t+1+n)

= Rt+1 + (1 - λ)γv(St+1, wt) + γλGt+1

12.2

(1 - λ)λτ = (1 - λ) / 2

λτ = 1 /2

τ = logλ(1/2)

12.3

  1. wt+1 = wt + α[Gtλ - v(St,wt)]dv(St,wt)

  2. δt = Rt+1 + γv(St+1,wt) - v(St,wt)

  3. Gtλ = Rt+1 + (1 - λ)γv(St+1, wt) + γλGt+1

  4. Gt - V(St) = ∑k=t:T-1γk-tδk

Gt - V(St) = Rt+1 + (1 - λ)γv(St+1, wt) + γλGt+1 - V(St) …(3)

= Rt+1 + γv(St+1,wt) - v(St,wt) - λγv(St+1,w) + λγGt+1λ

= δt + λγ(Gt+1λ - v(St+1,wt)) …(2)

= δt + λγ(δt + λγ(Gt+2λ - v(St+2,wt)))

= ∑k=t:∞(λγ)k-tδk

12.4

∑ τ α π δ γ ∏ Δ Λ μ ∈ ∀ β λ ∞

TD(λ):

Δw = α∑tδtzt

zt+1 = γλzt + dV(St,w)

zt+2 = γλzt+1 + dV(St+1,w)

= γλ(γλzt + dV(St,w)) + dV(St+1,w)

= γλ2zt + γλzt+1 + zt+2

zt+n = ∑k=0:n(γλ)n-kdV(St+k)

Δw = α∑tδtk=0:t(γλ)t-kdV(Sk)

Don’t know how to prove it, refer to LyWangPx

12.5

(1) Gt:h - Gt:h-1

= δt + Vt + γ(Gt+1:t+n - Vt+1) - δt - Vt - γ(Gt+1:t+n-1 - Vt+1)

= γ(Gt+1:t+n - Gt+1:t+n-1)

= …

= γn(Gt+n-1:t+n - Gt+n-1:t+n-1)

= γn(Rt+n + γVt+n - Vt)

= γhδt+h

(2) ∑n=1:h-t-1λn-1Gt:t+n - λ∑n=1:h-t-2λn-1Gt:t+n

= ∑n=1:h-t-2λn-1(Gt:t+n - λ*λn-1Gt:t+n) + λh-t-2Gt:t+h-t-1

= (1 - λ)∑n=1:h-t-2λn-1Gt:t+n + λh-t-2Gt:h-1

(3) Gt:hλ = (1 - λ)∑n=1:h-t-1λn-1Gt:t+n + λh-t-1Gt:h

= ∑n=1:h-t-1λn-1Gt:t+n - λ∑n=1:h-t-1λn-1Gt:t+n + λh-t-1Gt:h

= ∑n=1:h-t-1λn-1Gt:t+n - λ∑n=1:h-t-2λn-1Gt:t+n - λh-t-1Gt:h-1 + λh-t-1Gt:h

= (1 - λ)∑n=1:h-t-2λn-1Gt:t+n + λh-t-2Gt:h-1 + λh-t-1γhδt+h

= Solve it recursively

(4) Intuitively, Gt+1 - Gt = estimated δ decayed by γ and λ

12.6

    Loop for i in F(S,A):
        δ -= wi
        zi += 1
    ...
    Loop for i in F(S',A): δ += γwi
    w += αδz
    z = (1-α)γλz
    S = S'; A = A'

12.6

12_6

Gtλs = (1-λt+1)Gt + γt+1λt+1Gt+1λs + λt+1Rt+1

Part I: λt+1Rt+1

Part II: (1-λt+1)Gt

Part III: γt+1λt+1Gt+1λs

Gtλs = Rt+1 + γt+1((1-λt+1)v(St+1,wt) + λt+1Gt+1λs)

Gt:hλs = Rt+1 + γt+1((1-λt+1)v(St+1,wt) + λt+1Gλst+1:h)

Gt:hλa = Rt+1 + γt+1((1-λt+1)q(St+1,At+1,wt) + λt+1Gλat+1:h)

Gt:hλs = Rt+1 + γt+1((1-λt+1)vexp(St+1) + λt+1Gλst+1:h)