Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetHack: add new model #295

Merged
merged 2 commits into from
Mar 5, 2024

Conversation

BartekCupial
Copy link
Collaborator

Motivation

This architecture reaches much better returns from multiple reasons listed below, it is featured in recent paper (current SOTA). The architecture was first introduced in Scaling Laws for Imitation Learning in NetHack.

Credit: Jens Tulys https://github.com/jens321/

Architecture Details (from the paper)

We use two main architectures for all our experiments, one for the BC experiments and another for
the RL experiments.

BC architecture. The NLD-AA dataset is comprised of ttyrec-formatted trajectories, which are
24 × 80 ASCII character and color grids (one for each) along with the cursor position. To encode
these, we modify the architecture used in Hambro et al., resulting in the following:

  • Dungeon encoder. This component encodes the main observation in the game, which is
    a 21 × 80 grid per time step. Note the top row and bottom two rows are cut off as those
    are fed into the message and bottom line statistics encoder, respectively. We embed each
    character and color in an embedding lookup table, after which we concatenate them and
    put them in their respective positions in the grid. We then feed this embedded grid into a
    ResNet, which consists of 2 identical modules, each using 1 convolutional layer followed by
    a max pooling layer and 2 residual blocks (of 2 convolutional layers each), for a total of 10
    convolutional layers, closely following the setup in Espeholt et al.
  • Message encoder. The message encoder takes the top row of the grid, converts all ASCII
    characters into a one-hot vector, and concatenates these, resulting in a 80 × 256 = 20, 480
    dimensional vector representing the message. This vector is then fed into a 2-layer MLP,
    resulting in the message representation.
  • Bottom line statistics. To encode the bottom line statistics, we flatten the bottom two
    rows of the grid and create a “character-normalized" (subtract 32 and divide by 96) and
    “digits-normalized" (subtract 47 and divide by 10, mask out ASCII characters smaller than
    45 or larger than 58) input representation, which we then stack, resulting in a 160 × 2
    dimensional input. This closely follows the Sample Factory3 model used in Hambro et al.

@BartekCupial
Copy link
Collaborator Author

I plan to add report with experiments in the next PR.

@codecov-commenter
Copy link

codecov-commenter commented Mar 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.83%. Comparing base (6c3ee69) to head (d2f14b8).

❗ Current head d2f14b8 differs from pull request most recent head fd46d79. Consider uploading reports for the commit fd46d79 to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #295   +/-   ##
=======================================
  Coverage   77.83%   77.83%           
=======================================
  Files         101      101           
  Lines        7773     7773           
=======================================
  Hits         6050     6050           
  Misses       1723     1723           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@BartekCupial BartekCupial merged commit 071574b into alex-petrenko:master Mar 5, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants