Skip to content

Regarding target calculations in DuelDDQN and indices #8

@EXJUSTICE

Description

@EXJUSTICE

Hi Phil,

I implemented your DuelDDQN architecture for myself, and was curious as to the following snippet of the learning function, as my question wasn't detailed in the course.

        q_pred = T.add(V_s,
                        (A_s - A_s.mean(dim=1, keepdim=True)))[indices, actions]

        q_next = T.add(V_s_, (A_s_ - A_s_.mean(dim=1, keepdim=True)))

        q_eval = T.add(V_s_eval, (A_s_eval - A_s_eval.mean(dim=1,keepdim=True)))

Why is it that only Q_pred is muliplied by the action maatrix, is it because it represents the actions we have just taken in the current state? Are all of these q_value matrices of the same dimensions?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions