Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

applet.control.n64: new applet #548

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rcombs
Copy link

@rcombs rcombs commented Mar 28, 2024

Substantially derived from the UART applet. Runs tool-assisted speedruns on Nintendo 64 consoles. Can either stream input over USB, or play back a file hardcoded in at build-time (though in the latter case it must be very small). Could be expanded to support Gamecube TASes fairly trivially, though I'm not sure how many of those exist (and sync on console). Can also be used to pipe realtime input to the controller port.

Wasn't sure how to classify this, so I put it in its own section for now.

I'll probably also add an [S]NES applet at some point.

@rcombs rcombs requested a review from whitequark as a code owner March 28, 2024 21:30
Copy link
Member

@whitequark whitequark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look this will not be mergeable as-is and will likely require significant changes. Due to the lack of review bandwidth this is all I can say for now, and there is no ETA for a more detailed review. Detailed review available at https://libera.irclog.whitequark.org/glasgow/2024-03-28#36073367

logger = logging.getLogger(__name__)
help = "tool-assisted speedrun playback for Nintendo 64"
description = """
Play back tool-assisted speedruns on a Nintendo 64 console.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: mention which format this is and a link to the format description:
https://tasvideos.org/EmulatorResources/Mupen/M64

@whitequark
Copy link
Member

@rcombs Would you like to finish the applet? If you do I can provide more detailed review to get it into a good shape, but I would also expect you to become its code owner, since nobody else who maintains the code has an N64. If you don't, we can just close it.

@rcombs
Copy link
Author

rcombs commented Jul 24, 2024

Sure, happy to; I've just rebased on latest main, fixed all the API incompatibilities, and cleaned up the self mutations.

@whitequark
Copy link
Member

Please put it under the control taxon: control-n64 is where it would fit into the hierarchy. (Basically treating the N64 as a generic sink for commands of some kind.)

Please add yourself to CODEOWNERS. (I'll invite you to the org in a moment.)

I'll go through the rest of the code a bit later.

@whitequark
Copy link
Member

Could you describe the function of the applet on a high level please? I will then suggest an architecture for it that would be a better fit for the FPGA than the current one.

@rcombs rcombs changed the title applet.tas.n64: new applet applet.control.n64: new applet Jul 25, 2024
@whitequark
Copy link
Member

I remembered there was a conversation here, which has some useful context. I would still appreciate a description of the architecture because the elaborate function is fairly long and on a quick skim I don't immediately understand how it works.

@whitequark
Copy link
Member

To parse this format I would like it if the parser lived in glasgow.protocol.mupen_m64 and if it was structured similarly to the VGM one. The parser probably should parse the header and then have a method that returns the entire frame list as a bytearray.

This bytearray could then be sent directly to the FPGA with a single .write call. On the FPGA side, I would expect there to be an FSM that retrieves four bytes from OUT FIFO, (probably puts them into a data.Struct according to the frame format?), then pushes them into a stream.

For driving the I/O, I would expect there to be a wiring.Component that accepts a stream of frames, and provides them to the N64 each time the I/O state is requested. Since there is a request-response protocol involved, I would expect this component to have this basic structure:

class N64ControllerEmulator(wiring.Component):
    i_data: In(stream.Signature(8))
    o_data: Out(stream.Signature(8))

    inputs: In(stream.Signature(32))

Two more components would be converting i_data and o_data into the odd Manchester-like encoding required by the N64.

Does this sound good?

@rcombs
Copy link
Author

rcombs commented Jul 25, 2024

There is a single data pin, which connects to the data line of a controller port on a Nintendo 64 console. The data line is a 1-wire self-clocking serial connection. The console sends commands, and the gateware responds to them. The most common commands are "pair controller" (8 bits, where the response is 24 bits of configuration data) and "poll gamepad state" (8 bits, where the response is 24 bits of button bits and joystick bytes); less-common commands include reads and writes to mapped I/O devices that can be attached to a controller (save-data EEPROMs, vibration motors, Game Boy cartridge ROM and RAM), and those commands may be longer. All commands from the console must get a response within 62.5µs.

On startup, the console sends the pair controller command, then polls for input repeatedly (generally at the game's frame rate). The applet sends input frames played back from a file or pipe on the host machine, which may optionally have a Mupen64 header containing metadata (which is largely irrelevant to us). The host software logs each command from the console so that higher-level control software can know the state of the playback (e.g. what buttons are being pressed, how far into the file the run currently is, etc). Once the last input is read, this state is reported to the host, and further input polls from the console receive all-zeroes responses.

@whitequark
Copy link
Member

This bytearray could then be sent directly to the FPGA with a single .write call.

Regarding your latency concerns: in general, doing this will result in a significant portion of the data (you can look up exactly how much in access.direct.demultiplexer) being placed into a kernel buffer, in chunks. This kernel buffer is then provided to a scatter-gather DMA engine in the XHCI controller, which autonomously retrieves the data and feeds it to the device. It then signals completion for each chunk, which causes the Python process to wake up and feed it additional chunks.

If you're doing strictly unidirectional communication, almost all the work is done by the XHCI scatter-gather DMA engine, so it doesn't matter what the kernel does as long as the XHCI controller doesn't break. I would have actually placed all of the data into the USB buffers, but this requires changing a sysctl for silly Linux reasons, so I'm limiting it.

@whitequark
Copy link
Member

The host software logs each command from the console so that higher-level control software can know the state of the playback (e.g. what buttons are being pressed, how far into the file the run currently is, etc). Once the last input is read, this state is reported to the host, and further input polls from the console receive all-zeroes responses.

Oh, that's interesting. I would probably keep the single write call and add a way to asynchronously poll the core via the read_register functionality, since then the host software isn't plugged into the data path. Would that work?

@rcombs
Copy link
Author

rcombs commented Jul 25, 2024

Re: #548 (comment), that generally sounds good, with one caveat: I have some use-cases where the input is streamed over a pipe, and is not knowable in its entirety at startup (e.g. taking input from stream chat, or from a hardware device); returning a bytearray wouldn't work there. (That would generally want a minimally-sized buffer to reduce latency, and to send all-zeroes whenever a poll happens while the buffer is empty.)

Note that once the RAM-Pak is available, I'd like to also support extremely deep buffering (on order of hours) to allow the gateware to continue running even if the host machine crashes. I previously verified a 39-day TAS, which required a playback device with extreme levels of robustness; an earlier attempt failed when the host machine suffered a kernel panic (probably attributable to graphics driver problems running OBS).

Re: read_register, hmm, my reflexive concern is that polling might result in timing issues with the inputs (measuring the time between two polls allows TAS authors to determine whether their run is causing lag on the console, and the duration between the controller-pair and the final input is used to determine the length of the run for record/leaderboard purposes), but I suppose it should be possible to poll frequently enough that that isn't a meaningful problem?

@whitequark
Copy link
Member

I have some use-cases where the input is streamed over a pipe, and is not knowable in its entirety at startup (e.g. taking input from stream chat, or from a hardware device); returning a bytearray wouldn't work there. (That would generally want a minimally-sized buffer to reduce latency, and to send all-zeroes whenever a poll happens while the buffer is empty.)

Yes, that seems completely reasonable, and straightforward to implement with the architecture I proposed (when you run out of stream, return zeroes).

The basic algorithm for retrieving frames would be something like:

  1. Check inputs.valid.
  2. If it's high, there is data, return the data. Leave the data there.
  3. If it's low, return zeroes.
  4. When there is a new frame, strobe inputs.ready for one cycle.

How do you know if there's a new frame? There must be some kind of shared clock, but what is it?

@whitequark
Copy link
Member

Note that once the RAM-Pak is available, I'd like to also support extremely deep buffering (on order of hours) to allow the gateware to continue running even if the host machine crashes.

RAM-Pak is going to be (usually) presented to the gateware as an ultradeep FIFO, with all the HyperRAM specific concerns being hidden from view. You should be able to request this FIFO in the applet normally: so long as you aren't filling it, the data will "fall through" and never hit the HyperRAM at all (so, no latency added); if you do fill it, you have 1 Gbit to fill.

Re: read_register, hmm, my reflexive concern is that polling might result in timing issues with the inputs (measuring the time between two polls allows TAS authors to determine whether their run is causing lag on the console, and the duration between the controller-pair and the final input is used to determine the length of the run for record/leaderboard purposes), but I suppose it should be possible to poll frequently enough that that isn't a meaningful problem?

Can you explain the timing issues in more detail?

@rcombs
Copy link
Author

rcombs commented Jul 25, 2024

How do you know if there's a new frame? There must be some kind of shared clock, but what is it?

I'm a little bit confused here; I'm imagining this as just "other software sends 4 bytes on a pipe, expecting that the applet will ~immediately send them to the console".

Re: RAM-Pak, that sounds excellent.

The timing is just "it has to be possible for external code on the host machine to determine the duration between any two consecutive polls, and between controller-pair and the final poll, with ~millisecond precision", which I don't imagine should be particularly difficult to achieve regardless of the design.

@whitequark
Copy link
Member

I'm a little bit confused here; I'm imagining this as just "other software sends 4 bytes on a pipe, expecting that the applet will ~immediately send them to the console".

The part I don't understand is: you send a frame worth of inputs (4 bytes) per video frame (so, 25 times per second or... something like that. 30? whatever it is). To do this you need to know when to advance your inputs. To do this precisely and not get sidetracked with 39 day long runs you need to somehow synchronize the clock you have on Glasgow and the clock you have on N64, such that they all point to the same frame number at all times. How is this done?

@rcombs
Copy link
Author

rcombs commented Jul 25, 2024

Ah! The console (and the game software running on it) doesn't have a concept of frame numbers, or of a substantial buffer of them; it only has the current controller state (current controller state) and the state from the most recent previous poll (xor'd against the current state to determine which buttons were pressed/released since the last poll). It sends a poll command once per frame (usually around 30Hz); when it gets a response, it does prev_state = cur_state; cur_state = command_response;. For our purposes, video frames don't matter; we only care about controller polls (which, for Mario 64 and some games like it, happen exactly once per game "tick", whether that tick lasts 2 60Hz fields, 3, or more).

Put another way, a TAS file is a series of controller states which, when sent to the console in response to a sequence of consecutive input polls, deterministically produce a particular game state. So for synchronization purposes, we effectively have 2 clocks: the wire protocol clock (1MHz clocks run separately by the console and the controller/glasgow, but with pretty forgiving tolerances, as they only needs to remain synchronized for periods of a few µs at a time), and the frame clock (30Hz-ish, pulsed by the console sending the poll command).

@whitequark
Copy link
Member

Oh, okay. In this case the N64ControllerEmulator gateware would simply consume a frame worth of inputs each time it is polled. This is even easier to implement.

Can you tell me more about the status information you'd like to retrieve? I don't entirely understand the purpose and the mechanisms of it.

@rcombs
Copy link
Author

rcombs commented Jul 25, 2024

The most important status information is just the commands we receive from the console (and the times we receive them at). Everything else can, in principle, be determined from that combined with the TAS input. For convenience (and robustness against host machine downtime), my existing Teensy-based setup reports frame number, frame contents, time since controller pairing, and some other internal state on each frame (which allows the display code to be largely stateless, and requires no state at all to report the critical timing information).

I display this in an OBS overlay that looks like this:
image

Here's a short example video: https://www.twitch.tv/videos/1907721521

Frame numbers that take an unusually long time (⪆3/60ths of a second instead of 2) are also logged to stdio by the host software, and I send the output to the TAS file author for analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants