Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twrl-rebased #14

Merged
merged 11 commits into from
Dec 15, 2016
Merged

Twrl-rebased #14

merged 11 commits into from
Dec 15, 2016

Conversation

SeanNaren
Copy link
Collaborator

Based on #8, merged v1 as well into branch. Let me know if there are any remaining issues!

@Kaixhin
Copy link
Owner

Kaixhin commented Dec 2, 2016

Sorry am trying to finish off a paper due mid-NIPS before I head off to NIPS, so will have to put this on hold for a week. Have started taking a look but definitely need to have a thorough look through the whole thing before merging to master.

@SeanNaren
Copy link
Collaborator Author

@Kaixhin sounds good, enjoy NIPS :)

@Kaixhin
Copy link
Owner

Kaixhin commented Dec 13, 2016

Remind what the conclusion on timeStepLimit and maxSteps was? Currently the env will run to the minimum of these, rather than maxSteps overriding timeStepLimit.

Also. in gym, what's reward_threshold? Looking at MountainCar, it seems like a scaled version of rlenvs' rewardSpace, but only limited one way.

@SeanNaren
Copy link
Collaborator Author

SeanNaren commented Dec 14, 2016

Currently the default maximum time step per episode is 1000. This can be changed by the environment via timeStepLimit in the Env.lua class, and by maxSteps set by the user in the options. However if maxSteps is greater than timeStepLimit (and not null) it is set to timeStepLimit

Does that make sense? Any changes you would make?

@Kaixhin
Copy link
Owner

Kaixhin commented Dec 14, 2016

Seems a little counterintuitive to have timeStepLimit overwrite maxSteps, but if that's what gym does then let's stick to that for consistency. If so then go ahead and merge this pull request!

I see that reward_threshold is used to determine if an environment is "solved", but that's not of concern to this library immediately.

@SeanNaren
Copy link
Collaborator Author

I think as kory said this was so that each episode is a maximum fixed size for their leaderboards! And sounds good, will just do one quick pass through and merge :)

@SeanNaren SeanNaren merged commit 82a5084 into master Dec 15, 2016
@SeanNaren SeanNaren deleted the twrl-rebase branch December 15, 2016 10:08
@SeanNaren
Copy link
Collaborator Author

Thanks for your help @Kaixhin, was great stuff!

@Kaixhin
Copy link
Owner

Kaixhin commented Dec 15, 2016

Thanks a lot @SeanNaren! I'll check if anything needs to be backported to v1 myself, so good luck with twrl integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants