-
Notifications
You must be signed in to change notification settings - Fork 7k
Open
Labels
P3Issue moderate in impact or severityIssue moderate in impact or severitybugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogrllibRLlib related issuesRLlib related issuesrllib-offline-rlOffline RL problemsOffline RL problems
Description
What happened + What you expected to happen
Have a few things with offline learning, using both MARWIL and PPO.
- Firstly, there's no documentation on saving external experiences using SingleAgentEpisode (can't directly save either). I think it would be worthwhile to have an example like this within documentation:
def save_experiences(episodes: list[SingleAgentEpisode], filename: str):
import msgpack_numpy as mnp
episodes = [mnp.packb(ep.get_state(), default=mnp.encode) for ep in episodes]
episodes_ds = data.from_items(episodes)
episodes_ds.write_parquet(
filename,
compression='gzip'
)
del episodes_ds
episodes.clear()
Not sure if its a bug, but you need to do this code in order to properly get it to load (like can't directly load SingleAgentEpisode from OfflinePreLearner class. Right now, you can only load compressed files and there's no documentation or example on this.
- OfflinePreLearner's learning connector calls GAE on CPU during the initialization of Marwil Algorithm. The call-stack goes from offline_prelearner:249 to general_advantage_estimation:94. Here is the relevant offline config I'm using (alongside custom Model/Env):
.learners(
num_learners=1,
num_gpus_per_learner=1
).training(
train_batch_size_per_learner=32,
).offline_data(
input_=experiences,
input_read_episodes=True,
input_read_batch_size=32,
map_batches_kwargs={"concurrency": 1},
iter_batches_kwargs={"prefetch_batches": 1},
dataset_num_iters_per_learner=1,
)
- When using BC Algorithm, the first "running" call of the model uses Numpy arrays (gathered from offline experiences). It seems like the connectors didn't run on the batch (?). Uses same config as above.
Versions / Dependencies
Ray, master branch
Reproduction script
env = gym.make("CartPole-v1")
base_config: MARWILConfig = (
MARWILConfig()
.environment(
"CartPole-v1",
action_space=env.action_space,
observation_space=env.observation_space,
).learners(
num_learners=1,
num_gpus_per_learner=1
).training(
train_batch_size_per_learner=32,
).offline_data(
input_="tests/data/cartpole/cartpole-v1_large",
map_batches_kwargs={"concurrency": 1},
iter_batches_kwargs={"prefetch_batches": 1},
dataset_num_iters_per_learner=1,
)
)
callbacks = []
ray.init()
os.environ["RAY_AIR_NEW_OUTPUT"] = "0"
os.environ["RAY_verbose_spill_logs"] = "0"
Tuner(
base_config.algo_class,
param_space=base_config,
run_config=RunConfig(
stop={"training_iteration": 1000},
callbacks=callbacks,
progress_reporter=CLIReporter(),
),
).fit()
ray.shutdown()
Issue Severity
High: It blocks me from completing my task.
Metadata
Metadata
Assignees
Labels
P3Issue moderate in impact or severityIssue moderate in impact or severitybugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogrllibRLlib related issuesRLlib related issuesrllib-offline-rlOffline RL problemsOffline RL problems