Skip to content

[RLLIB] Offline Training #51747

@dkupsh

Description

@dkupsh

What happened + What you expected to happen

Have a few things with offline learning, using both MARWIL and PPO.

  1. Firstly, there's no documentation on saving external experiences using SingleAgentEpisode (can't directly save either). I think it would be worthwhile to have an example like this within documentation:
def save_experiences(episodes: list[SingleAgentEpisode], filename: str):
    import msgpack_numpy as mnp
    episodes = [mnp.packb(ep.get_state(), default=mnp.encode) for ep in episodes]
    episodes_ds = data.from_items(episodes)
    episodes_ds.write_parquet(
        filename,
        compression='gzip'
    )
    del episodes_ds
    episodes.clear()

Not sure if its a bug, but you need to do this code in order to properly get it to load (like can't directly load SingleAgentEpisode from OfflinePreLearner class. Right now, you can only load compressed files and there's no documentation or example on this.

  1. OfflinePreLearner's learning connector calls GAE on CPU during the initialization of Marwil Algorithm. The call-stack goes from offline_prelearner:249 to general_advantage_estimation:94. Here is the relevant offline config I'm using (alongside custom Model/Env):
.learners(
     num_learners=1,
     num_gpus_per_learner=1
).training(
      train_batch_size_per_learner=32,
).offline_data(
      input_=experiences,
      input_read_episodes=True,
      input_read_batch_size=32,
      map_batches_kwargs={"concurrency": 1},
      iter_batches_kwargs={"prefetch_batches": 1},
      dataset_num_iters_per_learner=1,
)
  1. When using BC Algorithm, the first "running" call of the model uses Numpy arrays (gathered from offline experiences). It seems like the connectors didn't run on the batch (?). Uses same config as above.

Versions / Dependencies

Ray, master branch

Reproduction script

env = gym.make("CartPole-v1")

base_config: MARWILConfig = (
    MARWILConfig()
    .environment(
        "CartPole-v1",
        action_space=env.action_space,
        observation_space=env.observation_space,
    ).learners(
        num_learners=1,
        num_gpus_per_learner=1
    ).training(
        train_batch_size_per_learner=32,
    ).offline_data(
        input_="tests/data/cartpole/cartpole-v1_large",
        map_batches_kwargs={"concurrency": 1},
        iter_batches_kwargs={"prefetch_batches": 1},
        dataset_num_iters_per_learner=1,
    )
)

callbacks = []

ray.init()
os.environ["RAY_AIR_NEW_OUTPUT"] = "0"
os.environ["RAY_verbose_spill_logs"] = "0"

Tuner(
    base_config.algo_class,
    param_space=base_config,
    run_config=RunConfig(
        stop={"training_iteration": 1000},
        callbacks=callbacks,
        progress_reporter=CLIReporter(),
    ),
).fit()

ray.shutdown()

Issue Severity

High: It blocks me from completing my task.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Issue moderate in impact or severitybugSomething that is supposed to be working; but isn'tcommunity-backlogrllibRLlib related issuesrllib-offline-rlOffline RL problems

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions