[RLLIB] Offline Training

### What happened + What you expected to happen

Have a few things with offline learning, using both MARWIL and PPO.

1. Firstly, there's no documentation on saving external experiences using SingleAgentEpisode (can't directly save either). I think it would be worthwhile to have an example like this within documentation:

```
def save_experiences(episodes: list[SingleAgentEpisode], filename: str):
    import msgpack_numpy as mnp
    episodes = [mnp.packb(ep.get_state(), default=mnp.encode) for ep in episodes]
    episodes_ds = data.from_items(episodes)
    episodes_ds.write_parquet(
        filename,
        compression='gzip'
    )
    del episodes_ds
    episodes.clear()
```
Not sure if its a bug, but you need to do this code in order to properly get it to load (like can't directly load SingleAgentEpisode from OfflinePreLearner class. Right now, you can only load compressed files and there's no documentation or example on this.

2. OfflinePreLearner's learning connector calls GAE on CPU during the initialization of Marwil Algorithm. The call-stack goes from offline_prelearner:249 to general_advantage_estimation:94. Here is the relevant offline config I'm using (alongside custom Model/Env):
```
.learners(
     num_learners=1,
     num_gpus_per_learner=1
).training(
      train_batch_size_per_learner=32,
).offline_data(
      input_=experiences,
      input_read_episodes=True,
      input_read_batch_size=32,
      map_batches_kwargs={"concurrency": 1},
      iter_batches_kwargs={"prefetch_batches": 1},
      dataset_num_iters_per_learner=1,
)
```
3. When using BC Algorithm, the first "running" call of the model uses Numpy arrays (gathered from offline experiences). It seems like the connectors didn't run on the batch (?). Uses same config as above. 



### Versions / Dependencies

Ray, master branch

### Reproduction script
```
env = gym.make("CartPole-v1")

base_config: MARWILConfig = (
    MARWILConfig()
    .environment(
        "CartPole-v1",
        action_space=env.action_space,
        observation_space=env.observation_space,
    ).learners(
        num_learners=1,
        num_gpus_per_learner=1
    ).training(
        train_batch_size_per_learner=32,
    ).offline_data(
        input_="tests/data/cartpole/cartpole-v1_large",
        map_batches_kwargs={"concurrency": 1},
        iter_batches_kwargs={"prefetch_batches": 1},
        dataset_num_iters_per_learner=1,
    )
)

callbacks = []

ray.init()
os.environ["RAY_AIR_NEW_OUTPUT"] = "0"
os.environ["RAY_verbose_spill_logs"] = "0"

Tuner(
    base_config.algo_class,
    param_space=base_config,
    run_config=RunConfig(
        stop={"training_iteration": 1000},
        callbacks=callbacks,
        progress_reporter=CLIReporter(),
    ),
).fit()

ray.shutdown()
```
### Issue Severity

High: It blocks me from completing my task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RLLIB] Offline Training #51747

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RLLIB] Offline Training #51747

Description

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions