Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bench.monitor.load_results should have an option to only load last n datapoints #563

Open
jckastel opened this issue Nov 18, 2019 · 1 comment
Labels
enhancement New feature or request

Comments

@jckastel
Copy link

When using Monitor in a callback function as described here, there should be an option to only load the last n lines in the dataframe. Otherwise, training performance will degrade over time as the load_results function loads increasingly large csv files, even though only the last n lines are used. An example of how to do this efficiently can be found in the first answer here (replace StringIO.StringIO by io.StringIO).

I implemented it in my own code (which does not use JSON) as follows:

from collections import dequeue

import io
import os
import json
import pandas


def load_results(path, num_lines=100):
    """
    Load last num_lines results from a given file.
    
    :param path: (str) the path to the log file.
    :param num_lines: (int) the number of lines to read.
    :return: (Pandas DataFrame) the data.
    """
    # get csv files
    monitor_files = glob(os.path.join(path, "*monitor.csv"))
    if not monitor_files:
        raise ValueError("no monitor files of the form *%s found in %s" % (Monitor.EXT, path))
    data_frames = []
    headers = []
    for file_name in monitor_files:
        if file_name.endswith('csv'):
            with open(file_name, 'r') as file_handler:
                first_line = file_handler.readline()
                assert first_line[0] == '#'
                header = json.loads(first_line[1:])

                second_line = file_handler.readline()

                data = deque(file_handler, num_lines)

                if len(data) < 2:
                    continue

                if data[1] != second_line and data[0] != second_line:
                    # Add missing header if necessary.
                    data = [second_line] + list(data)

                data_frame = pandas.read_csv(io.StringIO(''.join(data)), index_col=None)

                headers.append(header)
                data_frame['t'] += header['t_start']
                data_frames.append(data_frame)
        else:
            continue
    if len(data_frames) > 0:
        data_frame = pandas.concat(data_frames)
        data_frame.sort_values('t', inplace=True)
        data_frame.reset_index(inplace=True)
        data_frame['t'] -= min(header['t_start'] for header in headers)
        return data_frame

Full fix would need to add json support back in and maybe have num_lines<=0 read the full file, and set that as the default.

@Miffyli Miffyli added the enhancement New feature or request label Nov 18, 2019
@Miffyli
Copy link
Collaborator

Miffyli commented Nov 18, 2019

Admittedly loading the .csv file constantly could become a overhead if your environment has short episodes, but this is why the example only reads the file after every 1000th training step. However unless your .csv file is tens of megabytes big, this overhead should be negligible compared to time it takes to gather samples and updates. The shared code seems to do the trick, but is also tied specifically on how Monitor writes the files and also requires pandas (but this dependency could be replaced).

A dirty alternative is to read episode stats from agent in the callback, but this is messy as algorithms have different ways of storing this data. E.g. here is a piece of code I used in a monitor callback that was used with PPO, ACER and some other algorithms:

# Inside training callback function
if "ep_info_buf" in _locals.keys():
    # PPO with ep_info_buf deque
    ep_info = _locals["ep_info_buf"]
    # Take bunch of latest games and see if we have
    # new best average reward
    rewards = [ep["r"] for ep in ep_info]
    rewards = rewards[-20:]
    mean_reward = np.mean(rewards)
elif "episode_stats" in _locals.keys():
    # ACER with episode_stats
    mean_reward = _locals["episode_stats"].mean_reward()
...

Generally I think callbacks should have better and more unified access to episode statistics for this kind of plotting. Some algorithms already store last N episodes, which could be made available in the callbacks for easier monitoring/plotting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants