Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with the Newest Overcooked-AI Env #4

Open
muzhancun opened this issue Jan 14, 2024 · 0 comments
Open

Compatibility with the Newest Overcooked-AI Env #4

muzhancun opened this issue Jan 14, 2024 · 0 comments

Comments

@muzhancun
Copy link

I am trying to pair your reproduced baselines (downloaded from Google Drive) with my own human proxy model trained in the new Overcooked-AI environment (with old dynamics).
To make them compatible I first use the old lossless_state_encoding function for the baselines (only for the baselines model because my HP is trained in the new env).
Also, the env requires tensorflow 2 but your models are trained using tensorflow 1, so I load your models through the following code:

def get_model_policy_from_saved_model(save_dir, sim_threads=30):
    """Get a policy function from a saved model"""
    predictor = tf.saved_model.load(save_dir)
    step_fn = lambda obs: predictor.signatures["serving_default"](tf.convert_to_tensor(obs, dtype=tf.float32))["action_probs"]
    return get_model_policy(step_fn, sim_threads)

However this would raise warnings like:

WARNING:tensorflow:Unable to create a python object for variable <tf.Variable 'agent0/ppo2_model/pi/conv_0/kernel:0' shape=(3, 3, 25, 25) dtype=float32_ref> because it is a reference variable. It may not be visible to training APIs. If this is a problem, consider rebuilding the SavedModel after running tf.compat.v1.enable_resource_variables().

The code to pair the MEP baseline with my human proxy model is as follows:

def evaluate_hp_mep(hp_model_path, mep_model_path, layout, order=0):
    hp_model, hp_params = load_bc_model(hp_model_path)
    hp_policy = BehaviorCloningPolicy.from_model(
        hp_model, hp_params, stochastic=True
    )
    # print(hp_params)
    base_ae = _get_base_ae(hp_params)
    base_env = base_ae.env
    hp_agent = RlLibAgent(hp_policy, order, base_env.featurize_state_mdp)

    mep_agent = get_agent_from_saved_model(mep_model_path, sim_threads=30)

    ae = AgentEvaluator.from_layout_name(
        mdp_params={"layout_name": layout, "old_dynamics": True},
        env_params={"horizon": 400},
    )

    if order == 0:
        ap = AgentPair(hp_agent, mep_agent)
    else:
        ap = AgentPair(mep_agent, hp_agent)
    result = ae.evaluate_agent_pair(ap, 5, 400)
    return result, result["ep_returns"]

But the evaluation results are quite unsatisfying. Am I missing some steps or the warnings do matter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant