You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the DDPG tutorial has me pooping my pants. I want to suggest an example of creating a simple DDPG or similar agent that just acts and observes and gets the job done for a researcher looking to implement and RL algorithm on their own environment that has nothing to do with the usual benchmarking environments., i.e. just applying RL to their specific field.
This video advertises being able to use components without having to use the rest of the library, and I want to believe it, but when I look at the docs page I see a lot of components that I don't know how to use and when I look into the docs of the specific components I find that they take arguments that are an interface to something that I have no idea what it is and has an abstract name. Not to sound ignorant, but I feel like I have to know the entire framework just to use one part of it, which is againts the core idea, as I understand it.
Solution
Like, I have my own environment that's completely numpy and doesn't have anything to do with Gym or anything else, and I wan't to have the following workflow:
class MyAgent:
def __init__(self, **kwargs):
# torchrl code goes here
# how to init networks
# how to init a replay buffer, a simple one
# init objectives like DDPGloss
def act(self, state):
# how to produce an action with the actor network or more likely actor module
# how to add noise
def observe(self, s, action, new_s, reward):
# how to put a transition into the replay buffer
# how to update the neural networks
# so how to sample from the RB, how to use the objectives, how to backpropagate, how to soft update
env = MyEnv() # isn't made with torchrl
agent = MyAgent() # class made with torchrl
s = env.reset() # init state
for t in range(T):
action = agent.act(s)
new_s, reward = env.step() # could be converted to output tensordicts
agent.observe(s, action, new_s, reward) # observe transition and update the model
Just the "RL for dummies" toy example. For those of us who don't need transforms and parallelization just yet; we can get into that once we've got the basics working. Like, I found the component's I need - soft update, ddpg loss... I just don't know how to put them together without the monstrosity of the code that is DDPG tutorial.
Alternatives
/
Additional context
/
Checklist
[ x] I have checked that there is no similar issue in the repo (required)
I've found this issue that hits the spot but I don't know if it amounted to anything, and my issue is leaning towards providing an example of this low level functionality.
This issue is also pretty good but I'd aim for even simpler and especially for the environment to not need to be torchrl.
Conclusion
Those were my two cents. I hope I've hit the target with them. If there's something like this already available and I just haven't found it yet, please do let me know.
The text was updated successfully, but these errors were encountered:
The example you're giving is to the point.
Are you saying that, in your opinion, the first thing someone should be taught about is how to build an environment from scratch? I would guess that many newcomers would like to get their hands on a more "concrete" and pre-packed problem like solving an existing task (ie a gym environment or similar).
I'll be working on improving the doc this week and the next, i'll make sure that we get things sorted for you.
We can definitely do a better job at documenting the single features of the library.
It's sort of a rubiks cube, where you solve one face but it messes up the face on the other side:
If we do tutorials/examples about single features, it is confusing for newcomers as they need to go through multiple tutos to understand how to code a simple "go from A to B" policy.
If we go for the full tutorial, there are quickly too much info to be digested in one go.
Happy to hear your thoughts on what would make it easy for you to get started with the lib!
Motivation
Hey,
the DDPG tutorial has me pooping my pants. I want to suggest an example of creating a simple DDPG or similar agent that just acts and observes and gets the job done for a researcher looking to implement and RL algorithm on their own environment that has nothing to do with the usual benchmarking environments., i.e. just applying RL to their specific field.
This video advertises being able to use components without having to use the rest of the library, and I want to believe it, but when I look at the docs page I see a lot of components that I don't know how to use and when I look into the docs of the specific components I find that they take arguments that are an interface to something that I have no idea what it is and has an abstract name. Not to sound ignorant, but I feel like I have to know the entire framework just to use one part of it, which is againts the core idea, as I understand it.
Solution
Like, I have my own environment that's completely numpy and doesn't have anything to do with Gym or anything else, and I wan't to have the following workflow:
Just the "RL for dummies" toy example. For those of us who don't need transforms and parallelization just yet; we can get into that once we've got the basics working. Like, I found the component's I need - soft update, ddpg loss... I just don't know how to put them together without the monstrosity of the code that is DDPG tutorial.
Alternatives
/
Additional context
/
Checklist
I've found this issue that hits the spot but I don't know if it amounted to anything, and my issue is leaning towards providing an example of this low level functionality.
This issue is also pretty good but I'd aim for even simpler and especially for the environment to not need to be torchrl.
Conclusion
Those were my two cents. I hope I've hit the target with them. If there's something like this already available and I just haven't found it yet, please do let me know.
The text was updated successfully, but these errors were encountered: