-
Notifications
You must be signed in to change notification settings - Fork 36
Which level OBL was uploaded in the March 2021 push? #28
Comments
The OBL agent available here is a level 4 agent, so it is not the grounded policy. That is very abnormal. How do you play with the agent? If you are using the UI in the SPARTA repo and the |
Hi, hengyuan. Does it make sense to interpret the public-private network as a way for accelerating training & inference?
My understanding is that—— instead of each agent individually unfolding their |
@hengyuan-hu Thanks for the response on the OBL level, that makes sense to me. @keenlooks did most of the work to make the OBL model work with a slightly modified version of the webapp from the SPARTA repo, but from what I gather, he didn't use the |
@hoseasiu that's correct. I created a |
@keenlooks That sounds right. Have you checked the selfplay score of the converted jit model? |
@peppacat If we use a private lstm, then we not only need to sample a hand (o_private), but also the entire history of my hand (tau_private). Therefore we have to use a network structure where the recurrent part does not depend on tau_private, both feed-forward & public-private network satisfy this requirement. |
By "newly drawn," I just mean that it's the newest card in the bot's hand. It will have been there for at least long enough for it to receive two hints that give it full information on the card, but in the interim, the bot didn't draw anything new, so it's still the newest card. In the cases we saw, the bot would play that newest card after the second applicable hint, even though when taken together, the revealed information on that card gave it perfect information that the card was in fact not playable. We can post some examples the next time we test. |
@keenlooks I have not checked the self-play score of the converted JIT model. @hoseasiu do you now if you all have been able to check that yet? |
Hi Hengyuan,
We've been trying out the OBL model that you had uploaded, and it's a very good agent - certainly the most human and performant of the learning-based agents I've played with. Two questions came up when we tried it that we were hoping you could clarify.
The paper refers to multiple levels of OBL bots, but only one was uploaded, and it wasn't clear which one this was from the readme or the bot name. Which level was it? In our (human) interactions with it, it occasionally played cards without full information, especially when given a hint on a newly-drawn card, which seems to indicate deviation from optimal grounded policy and make it a higher-order OBL behavior to me?
We also noticed that the bot sometimes makes incorrect play attempts on cards with full information, again typically when the cards are newly drawn and hinted towards. This seems to be a case where learned convention at higher levels is overriding optimal grounded policy? Is that consistent with your experience?
Thanks!
Hosea
The text was updated successfully, but these errors were encountered: