Replies: 1 comment
-
Notice the discount factor is 1, which should mean only one of the reward values would be picked. It's probably superfluous we set the others to be the same, and hurts readability (clearly) - makes for easier experimentation with other discount values. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
def _overwrite_trajectory_reward(sequence_example: tf.train.SequenceExample,
reward: float) -> tf.train.SequenceExample:
"""Overwrite the reward in the trace (sequence_example) with the given one.
Args:
sequence_example: A tf.SequenceExample proto describing compilation trace.
reward: The reward to overwrite with.
Returns:
The tf.SequenceExample proto after post-processing.
"""
sequence_length = len(
next(iter(sequence_example.feature_lists.feature_list.values())).feature)
reward_list = sequence_example.feature_lists.feature_list['reward']
for _ in range(sequence_length):
added_feature = reward_list.feature.add()
added_feature.float_list.value.append(reward)
return sequence_example
Beta Was this translation helpful? Give feedback.
All reactions