NetHack: add new model #295

BartekCupial · 2024-03-02T14:44:31Z

Motivation

This architecture reaches much better returns from multiple reasons listed below, it is featured in recent paper (current SOTA). The architecture was first introduced in Scaling Laws for Imitation Learning in NetHack.

Credit: Jens Tulys https://github.com/jens321/

Architecture Details (from the paper)

We use two main architectures for all our experiments, one for the BC experiments and another for
the RL experiments.

BC architecture. The NLD-AA dataset is comprised of ttyrec-formatted trajectories, which are
24 × 80 ASCII character and color grids (one for each) along with the cursor position. To encode
these, we modify the architecture used in Hambro et al., resulting in the following:

Dungeon encoder. This component encodes the main observation in the game, which is
a 21 × 80 grid per time step. Note the top row and bottom two rows are cut off as those
are fed into the message and bottom line statistics encoder, respectively. We embed each
character and color in an embedding lookup table, after which we concatenate them and
put them in their respective positions in the grid. We then feed this embedded grid into a
ResNet, which consists of 2 identical modules, each using 1 convolutional layer followed by
a max pooling layer and 2 residual blocks (of 2 convolutional layers each), for a total of 10
convolutional layers, closely following the setup in Espeholt et al.
Message encoder. The message encoder takes the top row of the grid, converts all ASCII
characters into a one-hot vector, and concatenates these, resulting in a 80 × 256 = 20, 480
dimensional vector representing the message. This vector is then fed into a 2-layer MLP,
resulting in the message representation.
Bottom line statistics. To encode the bottom line statistics, we flatten the bottom two
rows of the grid and create a “character-normalized" (subtract 32 and divide by 96) and
“digits-normalized" (subtract 47 and divide by 10, mask out ASCII characters smaller than
45 or larger than 58) input representation, which we then stack, resulting in a 160 × 2
dimensional input. This closely follows the Sample Factory3 model used in Hambro et al.

BartekCupial · 2024-03-02T14:44:51Z

I plan to add report with experiments in the next PR.

codecov-commenter · 2024-03-02T14:58:55Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.83%. Comparing base (6c3ee69) to head (d2f14b8).

❗ Current head d2f14b8 differs from pull request most recent head fd46d79. Consider uploading reports for the commit fd46d79 to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #295   +/-   ##
=======================================
  Coverage   77.83%   77.83%           
=======================================
  Files         101      101           
  Lines        7773     7773           
=======================================
  Hits         6050     6050           
  Misses       1723     1723

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

BartekCupial assigned alex-petrenko Mar 2, 2024

BartekCupial assigned klyuchnikova-ana and unassigned alex-petrenko and klyuchnikova-ana Mar 2, 2024

BartekCupial requested a review from klyuchnikova-ana March 2, 2024 14:45

klyuchnikova-ana approved these changes Mar 4, 2024

View reviewed changes

BartekCupial force-pushed the nethack_new_model branch from a277826 to d2f14b8 Compare March 5, 2024 12:01

add new model

fd46d79

BartekCupial force-pushed the nethack_new_model branch from d2f14b8 to fd46d79 Compare March 5, 2024 12:03

fix font path

062205f

BartekCupial merged commit 071574b into alex-petrenko:master Mar 5, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NetHack: add new model #295

NetHack: add new model #295

BartekCupial commented Mar 2, 2024

BartekCupial commented Mar 2, 2024

codecov-commenter commented Mar 2, 2024 •

edited

Loading

NetHack: add new model #295

NetHack: add new model #295

Conversation

BartekCupial commented Mar 2, 2024

Motivation

Architecture Details (from the paper)

BartekCupial commented Mar 2, 2024

codecov-commenter commented Mar 2, 2024 • edited Loading

Codecov Report

codecov-commenter commented Mar 2, 2024 •

edited

Loading