-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add eligibility trace variants in algorithms #12
Comments
@mathemage, let me know if you want to take this, this would be a good "intro to RL and haskell" issue |
Hi @stites ! Sorry for the delay, I was overwhelmed with work. Sure, I'd like to take it if it's still available. Can you describe in more details what is it about? Perhaps JIRA issue or at least split into subtasks, what needs to be done, I've got no idea where to begin and what I am supposed to do... Thanks! |
Very exciting! You are actually well timed. So there are two sections of algorithms which I think would be nice to include: the foundational RL algorithms, and some more comprehensive deep learning variations. This falls in the first category (and, as I mentioned, is possibly a good way to get started with haskell) -- perhaps later, if you are interested, you can also help out with the next section. The current plan is to use github for issue management, so we'll just iterate on this ticket. You can also ping me on gitter via the datahaskell group if you want some faster turn-around on any Q&A you might have. I'll fill out more details now and include a checklist for you to work off of - if any of those items get too big; feel free to create a new issue and link this ticket. Right now, I think the best thing to do might be just downloading the repo, running stack, building the demo in |
After filling out this ticket, I think the right thing to do is to treat this as your epic. I'm going to c/p each item on this ticket as a new issue, which I'll assign to you, and I'll keep this list updated. Update: looks like I can't assign you a ticket until you submit your first PR, but #10 is actually the first item on this list |
@stites Cool, I'll get to it when I have more spare time (2 crazy weeks upcoming now). There's no deadline for this, right? |
haha, there are no deadlines in open source. If someone else wants to take this from you, I'll just send you a ping. |
If you're unfamiliar with eligibility traces, they basically unify temporal-difference learning with monte carlo methods -- essentially you hold a buffer in memory of an agent's experience and perform reward discounting across each step's trace. You might also want to check out n-step returns as the inverse of eligibility traces (ie: "looking into the future instead of looking into the past"), although n-step is more compute heavy and, thus, less important. The primary reference to ramp up on this kind of knowledge would be the second revision of the reinforcement learning book, chapter 12 (june draft link, perma link, code).
@mathemage, I think this might be a good series of steps for getting started with this implementation. While I think the first item will get you acquainted with the current code, I think it would be best to hardcode as much as possible for your PR and we can start a feature branch.
Build an example using current code. Use
reinforce-algorithms
to come up with an example of using the current algorithm interfaces (Reinforce.Algorithms
), and the Q-Table "backend" (Reinforce.Agents
). This would go into thereinforce-zoo
folder and would be a good introduction to current internals. You can open new ticket for this if it takes a long time.Decide on, and create a ticket for a function approximator backend. This repository is lacking a function approximator backend (some linear model or a nn-backend). You may have to depend on hmatrix or accelerate (I would suggest looking at accelerate, although hmatrix may be more beginner friendly). There is a written a prototype linear approximator in hmatrix here -- it's BSD3 licensed so feel free to c/p at your convenience. Put this in
Reinforce.Agents.<Your backend name>
Hard-code TD(λ). Currently,
reinforce-algorithms
is split into algorithms and backends (as you will have figured out through the first item). Ignore this and hard-code a TD(lambda) algorithm inReinforce/Algorithms/QLearning/EligibilityTrace.hs
. You can depend onreinforce-environments*
if it makes things easier. Psuedocode for this can be found on page 307 of Barto and Sutton:The text was updated successfully, but these errors were encountered: