A list of work that needs to be done on the varcompfa
package.
- Random seeding for fixed policies, agents, etc.
- Validity of shoe-horning in a terminal update for OpenAI gym
- History callback that can record parameter values
- Flat history callback
- Targetted history callback (e.g., it computes things on its own, stores them under appropriate key/value in a dictionary)
- History 'filtration' and serialization (HDF5? CSV?)
- Parameter-setting callbacks
- Check serialization via
json_tricks
- Terminal progress widget using Blessed
- Web-based dashboard/remote monitor (using
react-blessed
or something) - Figure out a way of making the agent class more efficient
- Currently it computes the feature vector too many times
- Caching the result somehow would definitely help (but
joblib
doesn't do this properly/efficiently)
- State-dependent action-probabilities for control algorithms
- Binary vector tiling class
- A full example of the experiment pipeline
- Define a learning agent, have it learn a policy, then freeze the policy
- Use the frozen policy in a series of policy evaluation experiments
- Performing full MC-rollouts from various start states sampled from the environment
- Record and analyze the results
- Make plots illustrating the results
- Plotting functions
- Need some boilerplate code that can produce graphs for publication
- Faster way of recording the results (
json_tricks
feels the strain,json
doesn't quite work,pickle
is not ideal either)- It seems faster when you use compression, however
- However, even with compression we tend to see very large files...
- More efficient way of recording experiments
- Command line interface
- Add more documentation once the code is more or less finalized
- HTML documentation (Sphix+Napoleon?)
- Add to the Makefile to automate generating the docs, updating version,...
- Set up scripts for running the experiments
- This could be done as a Makefile, but may be better as a Python script
- Add the ability to add comments/tags to experiments
- Kuhn-triangulation / online representation refinement
- Parallelization of the code (via Joblib? IPython Kernels? Celery?)
- Finish adding a
get_config()
method to all classes or just use json_tricks?- Agents
- Features
- Environments(?)
- Experiments
- Automatic versioning
- Global logging setup (see
openai/gym
for an example) - Figure out a way to tag experiments/add more metadata to them so that their purpose is clear weeks/months afterwards.
- Specify parameters either as a number or as a class + args
- Specify algorithms similarly
- For features, probably something more in-depth is required.
- Specify controller either as absolute path or search for it
- Add episodic callbacks back in.
Some parts of the code need test coverage soon, because the project is now at the stage where
-
Agent
-
Experiment
- Partially completed, need to wait for rest of API to settle down
-
Callback
- Various analysis functions (e.g., for computing the return)
I have ideas on how to make a generic MDP framework so that we don't have to implement a separate class for every MDP we wish to implement.
This would be helpful because there are MDPs defined in the papers we wish to examine that differ slightly from each other, so there would be a lot of copy-pasting, which is non-ideal.
Once it works we will have one single class to test, and so if we do end up defining a separate gym
environment for each, we can at least proceed with confidence in the backend that they're running on.
- Not super stoked on
Experiment.learners
as list of learning agents... butagents
doesn't really capture it either. - Reorganize namespace to increase clarity
- See TensorFlow as an example of how to prune namespace w/
inspect
- See TensorFlow as an example of how to prune namespace w/
- Improve the code for feature generation so that the DAG structure is more coherent
- Remove 'child' attributes, compose as functions instead
- Profile the code
- Add more test coverage to the overall code base
- Uniform hashing tile coding
- Better progress bar
- Live dashboard
- Remote monitor
- Implement tile coding and other discretization/feature functions in C.
- Determine if it would be worthwhile to implement 'cascading' algorithms that can run online in an experiment class
- It is manifestly desirable past a certain point, but it would be require a different setup to record these sorts of experiments in a modular way (as we have with callbacks)
- Perhaps via some sort of
LoggingAgent
?
- Implement the frontend using Javascript, either by hand or with one of the dashboard libraries (OpenMCT or similar)
- Have the data POST to the dashboard using
requests