Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DQN example: target DQN == behavior DQN (bug? or by design?) #32

Open
gordicaleksa opened this issue Feb 25, 2021 · 1 comment
Open

Comments

@gordicaleksa
Copy link

Hi!

Did you make these 2 the same on purpose? Following the "Algorithm 1" from the original arxiv 2013 paper?

They initially stated that we should freeze the DQN and use it as the target net (because of stability),
but later in "Algorithm 1" they (probably by mistake) used the same theta params for both nets.

@Ethan00Si
Copy link

I think it's a bug. You can get more information from this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants