Implement GNN-based PPO graph->node for ray.rllib framework #472

nhuet · 2025-02-21T15:40:33Z

We are dealing here with domain whose observations are graphs and actions, nodes of these graphs.

The custom model follows what has been done for sb3 framework:

we predict values as before with GNN feature extraction + reduction to a fixed number of features + MLP
we predict actions with a single GNN which predict logits for each node without knowing in advance their number.

To use that feature, we need to set the attribute graph_node_action to True. The doc of RayRLlib has been enriched to explain what is happening in that case.

The main issue here is that we have also to pad rollout buffers entriescorresponding to action_dist_inputs (the logits) as the number of available actions is potentially varying from an observation to another.

In case of action masking, a bit more work is necessary:

Because the length of action mask (=number of nodes) may vary, the custom models used before (TorchParametricActionsModel) does not apply properly since it uses an action embedding needing the (max)
number of actions in advance. We thus use a simpler version of model (TorchMaskedActionsModel), much like
what is done for sb3 framework: we predict action logits as done when no action masking applies and only at the end, we apply the mask (by adding log(action_mask) to the logits). The main difference is that no last layer is managed by the
custom model with weights changed for non applicable actions, this is managed instead by the GNN.
Once again the buffers may list action masks of different sizes so we need to apply padding before concatenation.
Lastly, the first dummy samples used to initialize weights are generated by rllib from the observation space of the AsRLlibMultiAgentEnv. So we need to match the default size chosen for the nodes when converting the graph space into a dict space (to be recognized by rllib) in the action_mask space enriching the observation space. This is done in _create_agent_obs_space_for_rllib().

As more stuff on action masking is done here, we create an arborescence similar to what is done for gnn to regroup code related to it:

the custom models for action masking are now available in:
- skdecide/hub/solver/ray_rllib/action_masking/models/tf for tensorflow
- skdecide/hub/solver/ray_rllib/action_masking/models/torch for torch
the space conversion utilities (such as keys for true obs and action mask) in skdecide/hub/solver/ray_rllib/action_masking/utils/spaces/space_utils.py

fteicht · 2025-03-01T08:59:44Z

@nhuet I made a fix for the windows compilation error in master. Can you please synchronise your branch with the latest master before I re-run the github actions of your pull request? Thanks

The custom model follows what has been done for sb3 framework: - we predict values as before with GNN feature extraction + reduction to a fixed number of features + MLP - we predict actions with a single GNN which predict logits for each node without knowing in advance their number. To use that feature, we need to set the attribute graph_node_action to True. The doc of RayRLlib has been enriched to explain what is happening in that case. The main issue here is that we have also to pad rollout buffers entries corresponding to action_dist_inputs (the logits) as the number of available actions is potentially varying from an observation to another.

The length of the action mask may vary, as the number of actions is the number of nodes in the observation graph, whichs may vary from a step to another. Because of that, the custom models used before (`TorchParametricActionsModel`) does not apply properly since it uses an action embedding needing the (max) number of actions in advance. We thus use a simpler version of model (`TorchMaskedActionsModel`), much like what is done for sb3 framework: we predict action logits as done when no action masking applies and only at the end, we apply the mask (by adding log(action_mask) to the logits). The main difference is that no last layer is managed by the custom model with weights changed for non applicable actions, this is managed instead by the GNN. Once again the buffers may list action masks of different sizes so we need to apply padding before concatenation. Lastly, the first dummy samples used to initialize weights are generated by rllib from the observation space of the `AsRLlibMultiAgentEnv`. So we need to match the default size chosen for the nodes when converting the graph space into a dict space (to be recognized by rllib) in the action_mask space enriching the observation space. This is done in `_create_agent_obs_space_for_rllib()`. As more stuff on action masking is done here, we create an arborescence similar to what is done for gnn to regroup code related to it: - the custom models for action masking are now available in: - skdecide/hub/solver/ray_rllib/action_masking/models/tf for tensorflow - skdecide/hub/solver/ray_rllib/action_masking/models/torch for torch - the space conversion utilities (such as keys for true obs and action mask) in skdecide/hub/solver/ray_rllib/action_masking/utils/spaces/space_utils.py

This fix a bug appearing in tests unrelated to gnn bu chained after it.

nhuet · 2025-03-03T08:42:00Z

@nhuet I made a fix for the windows compilation error in master. Can you please synchronise your branch with the latest master before I re-run the github actions of your pull request? Thanks

Done

nhuet added 4 commits March 3, 2025 09:41

Un-monkey patch ray.rllib after training

73b75c5

This fix a bug appearing in tests unrelated to gnn bu chained after it.

Add explanation about how we monkey-patch rllib on workers.

1b376b8

nhuet force-pushed the gnn2node-rllib-mask branch from 0653bb6 to 1b376b8 Compare March 3, 2025 08:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement GNN-based PPO graph->node for ray.rllib framework #472

Implement GNN-based PPO graph->node for ray.rllib framework #472

nhuet commented Feb 21, 2025

fteicht commented Mar 1, 2025

nhuet commented Mar 3, 2025

Implement GNN-based PPO graph->node for ray.rllib framework #472

Are you sure you want to change the base?

Implement GNN-based PPO graph->node for ray.rllib framework #472

Conversation

nhuet commented Feb 21, 2025

fteicht commented Mar 1, 2025

nhuet commented Mar 3, 2025