Reinforced Learning Architecture #156

gabryelreyes · 2024-09-10T12:36:09Z

Optimization of the architecture. Special focus on the separation of the agent and the environment

hoeftjch · 2024-09-10T14:43:54Z

Suggestion for general architecture improvements:

create separate class handling the environment related functions (reward calculation, handling robot state transitions etc.). The training loop then may look like:

def train_ppo_agent(agent, env, episodes=1000, steps_per_episode=200):
    for episode in range(episodes):
        state = env.reset()
        episode_reward = 0

        for step in range(steps_per_episode):
            action = agent.get_action(state)
            next_state, reward, done, _ = env.step(action)

            # Store experience (state, action, reward, next_state, done)
            # ...

            state = next_state

remove depenency from Agentto the SerialMuxProt
separate raw data collection and environment reset trigger callback_line_sensors should not evaluate the environment reset condition. This will be hard issues when the input data is put together from more than just the line sensors. Consider to separate raw sensor data aquisition from processing.

Possible performance improvements:

consider using the TensorFlow Data API to batch and shuffle the data. This helps in efficiently loading and processing the data during training. With this the batching will also be done by the dataset itself, no need to store the bach indices separately. The implementation may look something like this:

def prepare_dataset(dataset, batch_size, buffer_size):
    dataset = dataset.map(preprocess_data)
    dataset = dataset.shuffle(buffer_size)
    dataset = dataset.batch(batch_size)
    dataset = dataset.prefetch(1)
    return dataset

def create_dataset_from_buffer(buffer):
    states = tf.convert_to_tensor(buffer.states, dtype=tf.float32)
    actions = tf.convert_to_tensor(buffer.actions, dtype=tf.float32)
    rewards = tf.convert_to_tensor(buffer.rewards, dtype=tf.float32)
    next_states = tf.convert_to_tensor(buffer.next_states, dtype=tf.float32)
    dones = tf.convert_to_tensor(buffer.dones, dtype=tf.float32)
    advantages = tf.convert_to_tensor(buffer.advantages, dtype=tf.float32)

    dataset = create_dataset(states, actions, rewards, next_states, dones, advantages)
    return dataset

apply the @tf.function decorators to function such as predict_action and learn

akrambzeo mentioned this issue Sep 10, 2024

Line Following with Agent-Based Decision Making #147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reinforced Learning Architecture #156

Reinforced Learning Architecture #156

gabryelreyes commented Sep 10, 2024

hoeftjch commented Sep 10, 2024

Reinforced Learning Architecture #156

Reinforced Learning Architecture #156

Comments

gabryelreyes commented Sep 10, 2024

hoeftjch commented Sep 10, 2024