[RLlib] Fix performance and functionality flaws in attention nets (via Trajectory view API). #11729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

sven1977 wants to merge 194 commits into ray-project:master from sven1977:trajectory_view_api_attention_nets

Contributor

sven1977 commented Oct 31, 2020 •

edited

Loading

RLlib's attention nets (GTrXL) have been forced so far to run "inside" RLlib's RNN API (previous internal states are being passed as new state-ins in subsequent timesteps). This is not favorable for attention nets, which need a different handling and time-slicing of past states (attention net's memory). The trajectory view API allows for specifying the needed time-step ranges for forward passes and batched train passes through attention nets.
Besides the above, the handling of the tau-memory of attention nets was also not correct. This PR fixes existing bugs.
In a follow up PR, the torch version of GTrXL will be fully included in the testing as well (to make sure it's 100% en-par with the tf version).

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

sven1977 added 30 commits

October 23, 2020 22:12


          WIP.

e7f7e09


          WIP.

b03d49c


          WIP.


          WIP.

20e1dc6


          WIP.

dfd299c


          WIP.

88b17de


          WIP.

684360d


          WIP.

7121ded


          WIP.

a8f46b3


          WIP.


          Merge branch 'master' of https://github.com/ray-project/ray into traj…

4747f27

…ectory_view_api_attention_nets


          WIP.

2d0287e


          Merge branch 'trajectory_view_api_enable_by_default_for_all_simple' i…

b0be1f8

…nto trajectory_view_api_attention_nets

# Conflicts:
#	rllib/models/tf/attention_net.py
#	rllib/policy/view_requirement.py


          WIP.

84bd8d5


          WIP.

940e8d8


          WIP.

97fb268


          WIP.

9d75a84


          Merge branch 'master' of https://github.com/ray-project/ray into traj…

eec4bc5

…ectory_view_api_enable_by_default_for_all_simple


          WIP.

a3aebdf


          WIP.

c824acc


          WIP.

5149aba


          Merge branch 'master' of https://github.com/ray-project/ray into traj…

22af6bd

…ectory_view_api_attention_nets


          WIP.

8a30aa9


          WIP.

ca50712


          WIP.

8380ccf


          WIP.

d4ce5c4


          Merge branch 'master' of https://github.com/ray-project/ray into traj…

c0e979f

…ectory_view_api_enable_by_default_for_all_simple


          WIP.

85cfe0a


          WIP.

ddd9847


          WIP.

5ff50c7

sven1977 added 3 commits

November 30, 2020 09:43


          Merge branch 'master' of https://github.com/ray-project/ray into traj…

852b8e7

…ectory_view_api_attention_nets

# Conflicts:
#	rllib/agents/ppo/ppo_tf_policy.py
#	rllib/evaluation/tests/test_trajectory_view_api.py
#	rllib/policy/tf_policy_template.py
#	rllib/utils/tf_ops.py


          Merge branch 'attention_nets_prep_3' into trajectory_view_api_attenti…

0cd1dc5

…on_nets

# Conflicts:
#	rllib/agents/ppo/ppo_tf_policy.py
#	rllib/evaluation/collectors/simple_list_collector.py
#	rllib/evaluation/tests/test_trajectory_view_api.py
#	rllib/policy/dynamic_tf_policy.py
#	rllib/policy/policy.py
#	rllib/policy/sample_batch.py
#	rllib/policy/view_requirement.py


          WIP.

b2cf88c

michaelzhiluo reviewed

View reviewed changes

rllib/evaluation/collectors/simple_list_collector.py

    
                  def __init__(self, shift_before: int = 0):

                      self.shift_before = max(shift_before, 1)

                  def __init__(self, view_reqs):

                      self.shift_before = -min(

Contributor

michaelzhiluo Dec 3, 2020

Might want to add comment to describe what this code does!

Contributor Author

sven1977 Dec 8, 2020

done

rllib/evaluation/collectors/simple_list_collector.py

    
                  def add_init_obs(self, episode_id: EpisodeID, agent_index: int,

                                   env_id: EnvID, t: int, init_obs: TensorType,

                                   view_requirements: Dict[str, ViewRequirement]) -> None:

                      """Adds an initial observation (after reset) to the Agent's trajectory.

Contributor

michaelzhiluo Dec 3, 2020

Change the description. It adds more than a single observation.

Contributor Author

sven1977 Dec 8, 2020

No, it doesn't it's really just adds a single one. Same as it used to work w/ SampleBatchBuilder.

rllib/evaluation/collectors/simple_list_collector.py

    
                                                / view_req.batch_repeat_value))

                                  repeat_count = (view_req.data_rel_pos_to -

                                                  view_req.data_rel_pos_from + 1)

                                  data = np.asarray([

Contributor

michaelzhiluo Dec 3, 2020

Same as above, big confused. Add comments on what these lines of code do

Contributor Author

sven1977 Dec 8, 2020

done

Contributor Author

sven1977 Dec 8, 2020

Provided an example.

rllib/evaluation/collectors/simple_list_collector.py

    
                              shift = view_req.data_rel_pos + obs_shift

                              # Shift is exactly 0: Use trajectory as is.

                              if shift == 0:

                                  data = np_data[data_col][self.shift_before:]

Contributor

michaelzhiluo Dec 3, 2020

Same as before

Contributor Author

sven1977 Dec 8, 2020

Provided an example.

rllib/evaluation/collectors/simple_list_collector.py Outdated

    
                                      [np.zeros(shape=shape, dtype=dtype)

                                       for _ in range(shift)]

                  def _get_input_dict(self, view_reqs, abs_pos: int = -1) -> \

Contributor

michaelzhiluo Dec 3, 2020

Add description what this method does

rllib/evaluation/collectors/simple_list_collector.py

    
                      batch = SampleBatch(self.buffers)

                      assert SampleBatch.UNROLL_ID in batch.data

                      batch = SampleBatch(

                          self.buffers, _seq_lens=self.seq_lens, _dont_check_lens=True)

Contributor

michaelzhiluo Dec 3, 2020

What is _dont_check_lens?

Contributor Author

sven1977 Dec 8, 2020

added explanation.

rllib/evaluation/collectors/simple_list_collector.py

    
                      return batch

              class _PolicyCollectorGroup:

Contributor

michaelzhiluo Dec 3, 2020

Probably add comments on what this is

rllib/policy/sample_batch.py Outdated

    
                          for i, seq_len in enumerate(self.seq_lens):

                              count += seq_len

                              if count >= end:

                                  data["state_in_0"] = self.data["state_in_0"][state_start:

Contributor

michaelzhiluo Dec 3, 2020

Add comment on what this does

rllib/policy/policy.py

    
                              # Range of indices on time-axis, make sure to create

                              if view_req.data_rel_pos_from is not None:

                                  ret[view_col] = np.zeros_like([[

                                      view_req.space.sample()

Contributor

michaelzhiluo Dec 3, 2020

Same add comment here

rllib/models/tf/layers/relative_multi_head_attention.py

    
                      return x

              class PositionalEmbedding(tf.keras.layers.Layer):

Contributor

michaelzhiluo Dec 3, 2020

Add comments on what this does (how it initializes embedding per position based on cos/sin something)

sven1977 added 24 commits

December 7, 2020 10:41


          Merge branch 'master' of https://github.com/ray-project/ray into traj…

730f917

…ectory_view_api_attention_nets

# Conflicts:
#	rllib/agents/trainer.py
#	rllib/evaluation/collectors/simple_list_collector.py
#	rllib/evaluation/tests/test_trajectory_view_api.py
#	rllib/models/tf/attention_net.py
#	rllib/policy/policy.py
#	rllib/policy/torch_policy_template.py
#	rllib/policy/view_requirement.py
#	src/ray/raylet/node_manager.cc


          Merge branch 'master' of https://github.com/ray-project/ray into traj…

465ef0a

…ectory_view_api_attention_nets

# Conflicts:
#	rllib/agents/trainer.py
#	rllib/evaluation/collectors/simple_list_collector.py
#	rllib/evaluation/tests/test_trajectory_view_api.py
#	rllib/models/tf/attention_net.py
#	rllib/policy/policy.py
#	rllib/policy/torch_policy_template.py
#	rllib/policy/view_requirement.py
#	src/ray/raylet/node_manager.cc


          WIP.

f5d8f3f


          WIP.

c77d09a


          Merge branch 'master' of https://github.com/ray-project/ray into traj…

821448c

…ectory_view_api_attention_nets

� Conflicts:
�	rllib/agents/ppo/appo_tf_policy.py
�	rllib/agents/ppo/ppo_torch_policy.py
�	rllib/agents/qmix/model.py
�	rllib/evaluation/collectors/simple_list_collector.py
�	rllib/evaluation/rollout_worker.py
�	rllib/evaluation/tests/test_trajectory_view_api.py
�	rllib/models/modelv2.py
�	rllib/policy/dynamic_tf_policy.py
�	rllib/policy/policy.py
�	rllib/policy/sample_batch.py
�	rllib/policy/view_requirement.py


          WIP.

eaff083


          WIP.

0be8b71


          Merge branch 'master' of https://github.com/ray-project/ray into traj…

d6ebefc

…ectory_view_api_attention_nets


          WIP.

4872da6


          WIP.

331d7e4


          Merge branch 'master' of https://github.com/ray-project/ray into traj…

6533c6e

…ectory_view_api_attention_nets


          Fix.

83855b6


          Fix.

30de704


          Fix.

c442ffd


          Fix.

927dd14


          WIP.

21da0fd


          WIP.

d5351bd


          merge

68982de


          Merge branch 'master' of https://github.com/ray-project/ray into traj…

2b00d76

…ectory_view_api_attention_nets


          WIP.

b9b3120


          WIP.

e48cb0b


          Fix.

9b9067a


          WIP.

5671d3f


          Merge branch 'master' of https://github.com/ray-project/ray into traj…

9df2a1b

…ectory_view_api_attention_nets

sven1977 closed this

Contributor Author

sven1977 commented Dec 10, 2020

Moved here:
#12753

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet