JarvisVLA in MineRL 0.3.7

The project started out forking an older version of MineRL to get human interaction with the agent working, which was broken in the original JarvisVLA's MineStudio implementation. I then noticed JarvisVLA failing in simple tasks like mining oak logs in MineRL 0.3.7's minecraft env, while succeeding in the original JarvisVLA repo's minecraft env.

Installing and testing original JarvisVLA

The embedded JarvisVLA-oak is a fork of JarvisVLA that runs on my machine (Ubuntu 22.04.5 LTS, RTX 3090 Ti with CUDA 13) without conda. Follow the install instructions in the README and run "./run_oak_log_10.sh" to test JarvisVLA in the oak log gathering task for 10 iterations.

Installing and testing JarvisVLA in older minerl

./full_install.sh
source agent_env/bin/activate
pip install -r requirements_agent.txt

python -m venv vllm_env
source vllm_env/bin/activate
pip install -r requirements_vllm.txt

Quickstart

Activate agent venv

source agent_env/bin/activate

In another terminal run the VLLM server (hosted on port 3000 default):

source vllm_env/bin/activate
./vllm.sh

Run the agent on your task (in terminal with agent_env active):

python agent.py --task <your task prompt> --craft <item id>

Where task is the text prompt to be sent to the VLA and craft is the programmatic name of a minecraft id that triggers environment completion when detected in the inventory.

For example to prompt the agent to get oak logs:

python agent.py --task "harvest oak logs from the tree" --craft oak_log

This will launch minecraft environment and the agent will take actions until max_steps is reached or oak_log is obtained.

Interaction

To interact with the agent during inference, open another terminal while agent is still taking actions in environment and do:

python -m minerl.interactor 6666

This will automatically spawn a new minecraft client and connect you to agent's server.

Notes

JarvisVLA is very brittle. I tried several prompts of the variation "hit tree and get logs," and across all of them success rate is low (<10%). In the success cases, JarvisVLA spawned withe tree trunk in the center of the screen, within or one or two blocks away from hitting range.

The two main failure modes were tree detection and attack spamming. In the first case, it would begin hitting objects that are not trees (dirt blocks or leaf blocks). In the second case, the agent spent all steps attacking and doing nothing else.

I suspect the cause of the failures is running the agent in a different minecraft version (1.12.1) and graphic settings relative to its training data.

So I tested with JarvisVLA official repo:

cd JarvisVLA-oak
./JarvisVLA-oak/run_oak_log_10.sh

Results videos will be saved in JarvisVLA-oak/logs/

Qualitatively/quantitatively the agent performs MUCH better (average 70% success rate out of 10 tries, and more complex behavior like moving around to actively find trees)!

To exactly replicate environment from JarvisVLA-oak, I copied over its options.txt file while removing newer featuers.

The primary changes to replicate minestudio was changing MineRL's old FOV (130 Quake Pro mode) to 70 (Minestudio's default), setting gamma (brightness) to 2.0, and doubling particle quality.

But still seeing large differences in behavior of the agent - making me suspect something else was wrong. Specifically, the agent was getting stuck in loops where it would output just the attack token over and over again.

Example log:

[Step 728] Getting action from agent...
task: Mine the oak log
[VLLM] Calling with 5 messages
[VLLM] Response: <|reserved_special_token_178|><|reserved_special_token_204|><|reserved_special_token_219|><|reserved...
[VLLM] Time: 410.6ms
[VLLM] Extracted 5 special tokens: [151835, 151861, 151876, 151897, 151836]...
wall clock time: 458.30
  Action: forward=0, jump=0, attack=1, camera=[0. 0.]

I rechecked special token -> minerl action mapping to confirm its correct + message formatting being sent to VLLM.

I don't know the exact cause without ablating the working JarvisVLA-oak implementation exhaustively to replicate the behavior seen in MineRL 0.3.7.

But given the OpenAI format message being sent to the VLLM server hosting JarvisVLA is exactly the same in both repositories, I can say the cause is some Minecraft environment mismatch.

Conclusion

This was my first venture into getting a "VLA" model to work, and it immediately ran into brittleness issues with small environment changes. Got a taste of subtle shifts in environment breaking models that are strong on paper.

Modifications from MineRL

spaces.py - Removed self.shape = ()
core.py - Fixed collections.Mapping → collections.abc.Mapping
observables.py - Fixed np.int → int
MalmoEnvServer.java - Added UUID generation (50+ lines)
build.gradle - Configured for local MixinGradle

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
JarvisVLA-oak		JarvisVLA-oak
assets		assets
patches		patches
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
agent_logger.py		agent_logger.py
agent_wrapper.py		agent_wrapper.py
custom_crafting_forest.xml		custom_crafting_forest.xml
full_install.sh		full_install.sh
patch.sh		patch.sh
quick_install.sh		quick_install.sh
requirements.txt		requirements.txt
requirements_agent.txt		requirements_agent.txt
requirements_vllm.txt		requirements_vllm.txt
vllm_server.sh		vllm_server.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

JarvisVLA in MineRL 0.3.7

Installing and testing original JarvisVLA

Installing and testing JarvisVLA in older minerl

Quickstart

Interaction

Notes

Conclusion

Modifications from MineRL

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

pythonlearner1025/minerl_0.3.7

Folders and files

Latest commit

History

Repository files navigation

JarvisVLA in MineRL 0.3.7

Installing and testing original JarvisVLA

Installing and testing JarvisVLA in older minerl

Quickstart

Interaction

Notes

Conclusion

Modifications from MineRL

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages