-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
design server / client IPC for sharing framebuffer #4
Comments
We likely will still have to deal with this for launchers since xochitl uses its own SWTCON right? Which means that if you use a launcher, this solution will not provide you with a mechanism to just "Screenshot the screen" if xochtil is then started by the launcher. |
at the minimum, we need to deal with this for xochitl, yes. it would be cool if we can use LD_PRELOAD to expose xochitl's framebuffer memory through shared mem since the APIs use a pointer internally. the ideal case is to write something that searches for the memory location (and is not binary specific) but we can use hardcoded memory addresses for each binary hash of xochitl like reStream if it comes to it. |
after playing with rm2fb.so and xochitl and having them run / update the screen, it's possible for the SWTCON to get out of alignment and only one process will be able to draw, while the other process' SWTCON will be misbehaving. after a while, the misbehaving process may be able to draw again, but the other process will then have trouble. unable to draw = draws lighter than it should or draws more black pixels / lots of contrast. after talking to bokluk, this behavior seems like a reason to only have one SWTCON running at a time. alternatively, we can figure out what specifically causes this behavior and try to prevent it, allowing multiple SWTCONs to exist. |
So until this is solved, we won't be able to properly port launchers that expect to run before xochitl? |
i'm not sure. i think it is easy(ish) to return to the days of a launcher only launching one app (then we never have problems with multiple SWTCON running), but in that case the launcher will have to re-calibrate its own SWTCON when an app it launched dies. |
Restarting the launcher's SWTCON and force redrawing might be an option. It'll be kinda slow, but at least it would work. |
an interesting converse here (that you made me think of with your comment) is to restart xochitl whenever we return to it. i will try this out to see how it does |
I don't have an rM2 (at least at the moment), but I have done some work on rM1 (most noticeably, I wrote the VNC server which some folks use to effectively stream the framebuffer via snooping on the MXC_SEND_UPDATE ioctls). I'm trying to understand the rM2 framebuffer in order to, among other things, see if said server can be ported to it while maintaining the good efficiency we have on rM1. I was wondering if I am understanding the rM2 fb architecture correctly:
Is this approximately correct? I would also be curious whether any of you have had success in reverse engineering the interface between the framebuffer-update thread and the actual hardware (and whichever kernel module this goes through). I do rather have some concern around this (client/server) approach with regards to latency, as it seems that it unavoidably introduces an extra two context switches due to the involvement of an extra userspace process and potentially an extra copy. If we need a single shared SWTCON thread, I would be interested in whether we could potentially move it into the kernel, replacing xochitl's calls into it with those into the kernel? I do not have a good enough understanding of the back end architecture of the SWTCON thread (based on the files I see here, and without my own device, unfortunately) to be clear on whether or not this would introduce one extra context switch into xochitl, but it seems like it would still be better than trying to run it in another userspace process. |
@pl-semiotics the architecture is, the app writes to a buffer, then the buffer is copied/mangled/transformed with eink waveforms and written to /dev/fb0 and from then to the screen, the driver being the client/server approach was chosen because:
|
@ddvk thanks for the elaboration, that matches roughly what I was expecting (unfortunately for me). I'm a bit curious where update region detection and pushing an update to the display happens at the moment, though---as far as I understood, the actual display driver IC would most of the time remain idle, and only actually apply nonzero voltages to the display when it is informed of a region being updated? The mxsfb seems to provide a much more normal framebuffer interface, and doesn't do anything to detect writes to it, so I'm a bit unclear on how the driver knows when to actually apply the update waveforms that are being written to that area of memory. In particular, mxsfb seems to be designed to expose the embedded LCDIF block on i.MX and as best as I can tell from a quick read of the driver, seems to be configuring it in a dotclock mode where indeed a pixel's worth of data will be sent out every pixclock, which doesn't seem at all the appropriate way to drive an e-ink display. I understand entirely that rationale for the client/server approach. My thought is more: does it perhaps make sense to move the "server" portion into the kernel, for best performance? (Admittedly, this may degrade native xochitl slightly, if the update regions are in fact otherwise being magically DMA'd by the device somehow; but it'd still be better than a userspace-shared-memory solution). And this way we can even provide a compatible update ioctl() with rM1. (I wonder also whether mxc_epdc_v2_fb can be ported to be ~as performant as the binary userspace component, since I thought other i.MX7Dual devices used it). |
regarding performance, i would like to be methodical. that is to say :
i don't want to pursue an avenue for perf benefit until we have a sense of timings and see that the context switches or ipc comm are really a problem. my sense is that the biggest perf benefit you get is only sending over dirty regions for vnc, which will save large amounts (100-200 ms to save whole buffer vs 5-20 ms for small dirty regions while pen is drawing) context switches are on order of a few micro seconds, as is the ipc communication. I'm not against kernel driver, it makes sense, but i wouldn't want to put qt code in the kernel, so we'd need open impl. before making kernel driver. re mxsfb: i didn't think the driver is aware of dirty region, rather the swtcon would be. |
@raisjn Sorry, I should have been clearer---my performance concerns are not at all about the VNC solution, for which other (e.g. network) latencies are certainly orders of magnitude beyond those provided by the driver thread situation. (For that, I'm mostly interested in seeing how far down the stack I can hook in so as to maximize compatibility with different userspaces). I am interested, however, in ensuring that xochitl performance is not degraded when e.g. doing windowing things, since even a few ms there may be quite noticeable (I have half designed (if not yet quite implemented due to lack of time) a compositing approach for RM1 based on virtual FBs that intercept the update ioctls, which is able to provide quite good performance and eliminate an extra userspace process from the fastpath for display updates); and I also think it would be pleasant to provide a compatible interface with RM1. My first instinct is also that it feels rather cleaner to put this portion of hardware driving in the kernel; and there is perhaps a certain aesthetic advantage to not dipping back into another userspace thread here. Yes, I would certainly agree that trying to load the existing binary driver in the kernel would be a bad idea :) I suppose my question is how swtcon is conveying the dirty region to the hardware, as it certainly does not appear to be doing so via mxsfb, and I'm not seeing any other obvious (enabled) device drivers in the kernel source. I could imagine that certain writes to the mapped address space are doorbells for the epd controller (and this would probably be the one case where I'd argue to not add anything to the kernel; but rather (most likely) to try to figure out how to allow application swtcon threads to work together, perhaps by storing the relevant display state in shared memory), but I don't see any relevant configuration happening in mxsfb, and the memory space seems to come from dma_alloc_writecombine() rather than any hardware provided memory-mapping region. |
very cool! for our client ipc, we are using ld_preload to shim rm1 apps and pretend to be the mxcfb driver. (see issue #12), and hopefully in the future (rm2fb v2) we can start doing cool things around split screen apps, windowing, etc do you have a design doc for the compositor that can be shared?
this is something we are also thinking about |
@pl-semiotics When you get around to implementing it, please let me know. I'd love to pull that into Oxide. |
I'm closing this out as there is now a server / client implementation that uses LD_PRELOAD hooks and is mostly transparent to rM1 applications. A new issue can be opened for v2 design questions: per app framebuffers, dirty region emitting, etc |
@raisjn @Eeems This is perhaps not the right place to mention this, but since you seemed interested in my compositor plan---now that I'm done with the vnc server updates that were taking up most of my remarkable-related time, here are a couple of updates. I was originally going to write my own minimalistic compositor protocol implemented in-kernel with virtual framebuffers and a number of optimizations, but I'm now (seeing that the RM2 does everything in userspace, not using PXP/etc. hardware at all) thinking that it's probably better to just reuse Wayland (especially since one ought to get shared-memory Xwayland ~for free in that environment). If there are performance issues, it may be worth adding some protocol extensions for single-buffering and cropping buffers/matching stride sizes to reduce the number of copies needed, but my plan is to get a MWE by hacking together a wl_shm style compositor that draws to the rM1/2 framebuffer (and wrapper that emulates the RM1/RM2 native interface sufficiently for xochitl to draw to wayland---although for RM2 this is of course probably something that directly replaces the libqsgepaper EPFramebuffer with one backed by wayland). |
background
The rM2's framebuffer is not well understood, but we know it requires a software thread (SWTCON) to drive it. This thread runs with high priority and is responsible for using data from a memory location to drive the contents of the display.
We are able to get access to the framebuffer update APIs by using LD_PRELOAD while loading the remarkable-shutdown binary and hardcoding their memory addresses. The update API is similar to the way the rM1 updates: it takes a rectangle and a few arguments (like waveform_mode) and updates the display.
We propose using a client/server model for interacting with the framebuffer, instead of each applications starting its own SWTCON. (supernote a6 uses a similar design: only one process interacts with the framebuffer and the rest use IPC to communicate with it.)
proposal
Underlying technology:
remarks
advantages of using a server/client model
potential pitfalls
open questions
prototype
see
master
branchThe text was updated successfully, but these errors were encountered: