Skip to content

subpolicy question, #15

Description

@EthanCodesss

Hi! First,In ppo.py
self.policy = self.loss = -self.policy_loss + self.value_loss - self.entropy_loss
you said ' Reduce sum over all sub-policies (where only the active sub-policy will be non-zero due to previous filtering',but the loss will be a list. How can a list of loss background?
self.train_step = self.optimizer.minimize(self.loss, var_list=policy_params).

Second,you first compute '_create_sub_policy' ,in this part the loss will be reduce mean and finally became a scalar. After filtering,all sub policy module will output the same value. It really work?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions