subpolicy question,

Hi！ First,In ppo.py
`self.policy = self.loss = -self.policy_loss + self.value_loss - self.entropy_loss` 
you said ' Reduce sum over all sub-policies (where only the active sub-policy will be non-zero due to previous filtering',but the loss will be a list. How can a list of loss background?
`self.train_step = self.optimizer.minimize(self.loss, var_list=policy_params)`.

Second,you first compute '_create_sub_policy' ,in this part the loss will be reduce mean and finally became a scalar.  After filtering，all sub policy module will output the same value. It really work?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subpolicy question, #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

subpolicy question, #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions