Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Tutorial on how to build the simplest agent #897

Open
viktor-ktorvi opened this issue Feb 6, 2023 · 1 comment
Open
Assignees
Labels
enhancement New feature or request

Comments

@viktor-ktorvi
Copy link

Motivation

Hey,

the DDPG tutorial has me pooping my pants. I want to suggest an example of creating a simple DDPG or similar agent that just acts and observes and gets the job done for a researcher looking to implement and RL algorithm on their own environment that has nothing to do with the usual benchmarking environments., i.e. just applying RL to their specific field.

This video advertises being able to use components without having to use the rest of the library, and I want to believe it, but when I look at the docs page I see a lot of components that I don't know how to use and when I look into the docs of the specific components I find that they take arguments that are an interface to something that I have no idea what it is and has an abstract name. Not to sound ignorant, but I feel like I have to know the entire framework just to use one part of it, which is againts the core idea, as I understand it.

Solution

Like, I have my own environment that's completely numpy and doesn't have anything to do with Gym or anything else, and I wan't to have the following workflow:

class MyAgent:
   def __init__(self, **kwargs):
       # torchrl code goes here
       # how to init networks
       # how to init a replay buffer, a simple one
       # init objectives like DDPGloss
   
   def act(self, state):
        # how to produce an action with the actor network or more likely actor module
        # how to add noise
    
   def observe(self, s, action, new_s, reward):
        # how to put a transition into the replay buffer
        # how to update the neural networks
        # so how to sample from the RB, how to use the objectives, how to backpropagate, how to soft update 

env = MyEnv() # isn't made with torchrl
agent = MyAgent() # class made with torchrl

s = env.reset() # init state

for t in range(T):
    action = agent.act(s)
    new_s, reward = env.step() # could be converted to output tensordicts
    agent.observe(s, action, new_s, reward) # observe transition and update the model

Just the "RL for dummies" toy example. For those of us who don't need transforms and parallelization just yet; we can get into that once we've got the basics working. Like, I found the component's I need - soft update, ddpg loss... I just don't know how to put them together without the monstrosity of the code that is DDPG tutorial.

Alternatives

/

Additional context

/

Checklist

  • [ x] I have checked that there is no similar issue in the repo (required)
    I've found this issue that hits the spot but I don't know if it amounted to anything, and my issue is leaning towards providing an example of this low level functionality.

This issue is also pretty good but I'd aim for even simpler and especially for the environment to not need to be torchrl.

Conclusion

Those were my two cents. I hope I've hit the target with them. If there's something like this already available and I just haven't found it yet, please do let me know.

@viktor-ktorvi viktor-ktorvi added the enhancement New feature or request label Feb 6, 2023
@vmoens
Copy link
Contributor

vmoens commented Feb 7, 2023

Thanks for the useful feedback!

The example you're giving is to the point.
Are you saying that, in your opinion, the first thing someone should be taught about is how to build an environment from scratch? I would guess that many newcomers would like to get their hands on a more "concrete" and pre-packed problem like solving an existing task (ie a gym environment or similar).

Let's keep track of this in #883

I'll be working on improving the doc this week and the next, i'll make sure that we get things sorted for you.

We can definitely do a better job at documenting the single features of the library.
It's sort of a rubiks cube, where you solve one face but it messes up the face on the other side:

  • If we do tutorials/examples about single features, it is confusing for newcomers as they need to go through multiple tutos to understand how to code a simple "go from A to B" policy.
  • If we go for the full tutorial, there are quickly too much info to be digested in one go.

Happy to hear your thoughts on what would make it easy for you to get started with the lib!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants