-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reinforced Learning Architecture #156
Comments
Suggestion for general architecture improvements:
def train_ppo_agent(agent, env, episodes=1000, steps_per_episode=200):
for episode in range(episodes):
state = env.reset()
episode_reward = 0
for step in range(steps_per_episode):
action = agent.get_action(state)
next_state, reward, done, _ = env.step(action)
# Store experience (state, action, reward, next_state, done)
# ...
state = next_state
Possible performance improvements:
def prepare_dataset(dataset, batch_size, buffer_size):
dataset = dataset.map(preprocess_data)
dataset = dataset.shuffle(buffer_size)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(1)
return dataset
def create_dataset_from_buffer(buffer):
states = tf.convert_to_tensor(buffer.states, dtype=tf.float32)
actions = tf.convert_to_tensor(buffer.actions, dtype=tf.float32)
rewards = tf.convert_to_tensor(buffer.rewards, dtype=tf.float32)
next_states = tf.convert_to_tensor(buffer.next_states, dtype=tf.float32)
dones = tf.convert_to_tensor(buffer.dones, dtype=tf.float32)
advantages = tf.convert_to_tensor(buffer.advantages, dtype=tf.float32)
dataset = create_dataset(states, actions, rewards, next_states, dones, advantages)
return dataset
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Optimization of the architecture. Special focus on the separation of the agent and the environment
The text was updated successfully, but these errors were encountered: