The introduction to FRMQN model

  • Published on
    12-Jan-2017

  • View
    315

  • Download
    1

Embed Size (px)

Transcript

  • FRMQN

    2016/8/18

  • [1]J. Oh, V. Chockalingam, S. Singh, H. Lee. Control of Memory, Active Perception, and Action in Minecraft arXiv:1605.09128, 2016. DQNMinecraft

  • DQN

    DQNPlaying Atari with Deep Reinforcement Learning[2] http://www.slideshare.net/htsukahara/paper-intoduction-playing-atari-with-deep-reinforcement-learning Playing Atari with Deep Reinforcement Learning[3] http://www.slideshare.net/mooopan/ss-30336609 DQN[4] http://www.slideshare.net/ssuser07aa33/introduction-to-deep-q-learning

  • context

  • context

  • [1]Figure 2

  • [1]Figure 2

  • key

    [1]Figure 2

    key

  • CNNencoding

    [1]Figure 2

    h c

    w xt CNNencoding

    et = xt( ) e

    et

    xt Rchw

    et Re

  • [1]Figure 2

    Et = et1,et2,!,etM[ ]e

    Et

    Et ReM

    etM EtetM

    M

  • key

    [1]Figure 2

    encodekey

    Wkeye

    Et

    Wkey Rme

    Wkey

    m

    WkeyEt

  • key

    [1]Figure 2

    key

    M

    Et

    Mtkey RmM

    Mtkey

    mMt

    key =WkeyEt

    Mtkey

    Wkey

  • [1]Figure 2

    M

    2 5 3 ! 41 2 1 ! 22 6 2 ! 34 0 2 ! 1

    m

    Mtkey

    key t 2

    Mtkey

  • [1]Figure 2

    encode

    Wvale

    Et

    Wval Rme

    Wval

    m

    WvalEt

  • [1]Figure 2

    M

    Et

    Mtval RmM

    Mtval

    mMt

    val =WvalEt

    Mtval

    Wval

  • [1]Figure 2

    M

    3 4 1 ! 87 0 1 ! 22 6 4 ! 11 2 2 ! 0

    m

    Mtval

    t 2

    Mtval

  • context

    context

    m

    ht

    ht Rm

    3271!4

    m

    t 2

    [1]Figure 2

  • softmax

    m

    ht

    pt,i R

    t

    Mtkey i[ ]pt,i

    pt,i =exp ht

    Mtkey i[ ]

    exp htMt

    key j[ ] j=1M

    [1]Figure 2

  • context

    ht

    pt,i

    mm

    [1]Figure 2

    Mtkey i[ ]

    t = 2

    pt,i

  • pt

    [1]Figure 2

    pt = pt,1, pt,2,!, pt,M

    pt,i

    pt

    pt RM

  • [1]Figure 2

    ot =Mtval pt

    pt

    pt Rm

    Mtval

  • [1]Figure 2

    ptMt

    val i[ ]

    pt

    m

    Mtval

    M

    i

    M

  • [1]Figure 2

    qt =q ht,ot( )

    cntext

    ht

    qtot

    ht

    qtqt R

    a

    gt = f Whht +ot( )

    qt =Wqgt

  • context

  • context

    context MQN RMQN FRMQN

    ht

  • context

    MQN RMQN FRMQN

  • MQNcontext

    [1]Figure 2

    htCNN context ht =W

    cet

    Wcet

    etht

    htWc

    Wc Rme

  • context

    MQN RMQN FRMQN

  • RMQNcontext

    [1]Figure 2

    ht

    CNNrecurrentLSTM

    ht,ct[ ] = LSTM et,ht1,ct1( )

    et

    et

    ht1

    ht

    RrecurrentRLSTM

    LSTM ht1,ct1

    ct1 peephole

    ct1

    ct1 Rm

  • context

    MQN RMQN FRMQN

  • FRMQNcontext

    LSTM

    ht,ct[ ] = LSTM et,ot1[ ],ht1,ct1( )

    et

    ot1ht

    LSTM

    LSTM ht1,ct1

    ot1

    ot

    ot1

    [1]Figure 2

  • [1]Figure 2

    context

    et

    htLSTM ht1,ct1

    ot

    ot1

    ot1

  • [1]Figure 3

  • context

  • I

  • I

  • I

    indicator

    indicator

    [1]Figure 10

  • I

    l DQNDRQNFRMQN

    l DRQN

    l

    [1]Figure 3,6

    indicatorDRQNindicatorcontextindicator

  • I

    [1]Figure 7a

    indicator

    indicator

    FRMQN

  • I

  • [1]Figure 12

  • l DQNDRQN

    l DRQN

    [1]Figure 3,6

    l DRQN

    l FRMQNcontext

  • I

  • [1]Figure 19

    I indicator

  • training data

    training data

    [1]Figure 3,6

    FRMQNRMQNDRQN

  • unseen map

    training data

    [1]Figure 3,6

    FRMQNRMQN

  • Mincraft FRMQN

Recommended

View more >