Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSD output keeps disappearing in VGA mode on SPI1 #45

Open
tkurbad opened this issue Dec 30, 2021 · 10 comments
Open

OSD output keeps disappearing in VGA mode on SPI1 #45

tkurbad opened this issue Dec 30, 2021 · 10 comments

Comments

@tkurbad
Copy link

tkurbad commented Dec 30, 2021

I thought I'd break this out into a new issue as I can't seem to get hold of the cause.

Problem:
If a FF-OSD is connected to an Amiga on a flicker fixed (i.e. 31kHz) output via SPI1 of the blue pill, after the first mode switch, the OSD output will disappear. There will either be no OSD at all or only empty, but properly resized, background boxes without text.

To reproduce:
(1) Connect a blue pill with latest FF-OSD firmware via its SPI1 output to a flicker fixed Amiga output. Configure the FF-OSD to either work in VGA or automatic sync mode. Connect the Amiga keyboard lines as well.
(2) Start up the Amiga w/ both mouse buttons pressed to enter the early startup menu.
(3) After entering early startup, press LCTRL-LALT-Del on the Amiga keyboard to switch the OSD off. Instead of 'OSD Off', you will get no text output on screen, but just an empty OSD background box.
(4) Press LCTRL-LALT-Del again to turn the OSD back on. All subsequent OSD output will just be empty boxes (or nothing at all)

Note that all of this ONLY occurs while on SPI1 AND in VGA mode. I could neither reproduce it in 15 kHz mode nor on SPI2.

I tried to debug this, but am out of my wits now.
All the text to be shown is properly stored in the appropriate arrays, the render_line function seemingly does what it should and the empty OSD boxes are resized for the content they are supposed to show.
The occurance of the problem can be deferred (but not completely eliminated) by using the -O1 optimization level instead of -Os at compile time.

IMHO, this hints at either some barrier/timing problem or at some interrupt being missed.

I'd be very glad if someone could try and reproduce this as per the above steps so I can rule out a problem with my hardware setup.

Thanks in advance!

@keirf
Copy link
Owner

keirf commented Dec 30, 2021

I just had a user report that OSD hangs/disappears on their system after a while when running v1.9. Previously they ran v1.8. So it could be worth a test of your changes on top of v1.8?

@keirf
Copy link
Owner

keirf commented Dec 30, 2021

Another thing to try is a hard reset of the SPI peripheral during reconfigure. This can be enacted via RCC APBxRSTR registers.

@tkurbad
Copy link
Author

tkurbad commented Dec 30, 2021

Interesting. I'll try bare 1.8 later and see if it works for me.

@tkurbad
Copy link
Author

tkurbad commented Dec 30, 2021

Tried a few more things. Resetting the SPI peripheral before applying a new mode in the setup_spiX function has no apparent effect. This was no surprise, because there isn't really a mode switch happening between turning on the Amiga and entering the early startup. Resolution and frequency stay the same AFAIK.

Looking at the (small) diff between v1.8 and v1.9, I think the culprit is in the revised handling of the timers wrt the AT32F403.

I guess I'll do some kind of a v1.8.1 first, with all the hotkey functionality of v1.9 but w/o the F403 related changes and see how this goes.
If that works, I'll try and implement the proposed changes of issue #44 on top of that.

Next week I'll hopefully have much more time for all of this.

@keirf
Copy link
Owner

keirf commented Dec 31, 2021

I take it the 403 changes in v1.9 are still only suspected rather than definitively blamed?

@tkurbad
Copy link
Author

tkurbad commented Dec 31, 2021

Yes, it's still only a suspicion so far. Debugging is awfully slow with two 3 year olds that need constant attention during daytime... ;)

And, because of that, the most obvious test, namely checking if bare v1.8 does have the issue as well is still on my to-do list.

@tkurbad
Copy link
Author

tkurbad commented Dec 31, 2021

Ok, tried v1.8 now, and it behaves even worse: Whenever the Amiga output is switched off (or away from) and back on again, the next hotkey action completely stalls the OSD.

As with v1.9, this only happens while the flicker fixer is switched on, in 15 kHz mode everything is (and stays) fine. I'll have to dig deeper with more time next week. I'd like to look at the sync signals with an oscilloscope to better understand what's going on.

PS: Happy New Year! :)

@tkurbad
Copy link
Author

tkurbad commented Jan 2, 2022

As expected, my oscilloscope doesn't show anything suspicious. Hsync and vsync run uninterrupted from the moment I switch on the Amiga up until the Workbench appears (at least without screenmode.prefs)
Nonetheless, the moment I press LCTRL-LALT-Del, the OSD boxes become (and stay) empty until the blue pill is reset.
I played with barriers, optimization levels etc., but nothing really helps.

Funny enough, all non-flagged hotkeys don't exert that behavior. The example of switching between 4 ROMs using U(0) and U(1) works flawlessly. The instant I use one of the 'non-standard' hotkeys, the bad thing happens.

Everything seems to hint towards the output handling via snprintf corrupting the display buffer. However, this isn't supported by the fact that I can printk all the display struct values to the serial debug terminal, and it seems to be fine.

Then there's the very strict timing demands of the 36 MHz SPI1 output. So, another thing that might go wrong is the SPI DMA getting screwed up by things like out of order access or missed/long running interrupts. However, I'm not sure how to test for that...

Edit: Replaced snprintf for the "OSD On/Off" notify strings by a static strcpy - and sure enough it works.

I'll reverify tonight or tomorrow and if this really IS the issue, I'll prepare a PR.

@tkurbad
Copy link
Author

tkurbad commented Jan 3, 2022

@keirf The issue keeps reappearing and I still can't make sense of it. If I change the code it sometimes seems as if the problem might have gone away, because it's not there immediately, but after two or three power cycles of the Amiga it manifests upon first hit of LALT-LCTRL-Del. Perhaps you can think of something I'm not seeing.

Facts I might have established so far (I'm pretty sure of those):

  • It's not a SPI initialization problem. It stays the same even if the SPI channel is forcibly reset (and/or) re-setup during each run of the main loop.
  • It's not a string/struct initialization/corruption problem. I'm now debugging the code with gdb and a stlink debugger. All data structures I look at after the OSD disappears are intact, be it cur_display, notify, i2c_display, etc.
  • It's not a hard crash. If I run the .elf file via gdb, it keeps running even when the OSD output is irrevocably lost.

Furthermore (I'm certain about these):

  • It's not triggered by "normal" hotkeys that are enabled by means of default_config.c, only the OSD on/off combination or my videoswitch thingy trigger it. Let's concentrate on the former first, though, as this issue is also present in flashfloppy-osd v1.8 and v1.9.

  • It's not a FlashFloppy issue. If I turn off the Amiga after the OSD texts disappeared while keeping the bluepill powered externally, when the Amiga is turned back on, I'm getting empty OSD boxes right from the start.

  • This might be important: Once the issue occurs, screen mode switches are no longer correctly recognized. I.e. if I flip the flicker fixer switch on the Amiga while the OSD is operating normally, the OSD output is always re-displayed with the correct H- and V- offsets. Once the OSD text disappeared, the flashfloppy-osd obviously fails to recognize the mode change and the OSD box is squeezed onto the left border of the screen.

What I'm still not sure about:

  • Could this be related to the lazy memory/instruction ordering of the ARM?
  • Could this be due to the DMA buffer being corrupted somehow? How could I test for that?
  • Could this be due to IRQ(s) being disabled/missed? Not quite sure how to test for that either...

Any ideas where else to look? (Sorry for all the noise about this corner case issue, btw. ;-) )

Edit: Here's a picture of what the corrupted OSD looks like. In this state, the OSD box still resizes according to display.cols and display.rows, but never shows any text.
issue

@keirf
Copy link
Owner

keirf commented Jan 3, 2022

It seems likely that SPI, or the DMA which serves it, is somehow stuck. It is SPI output activity which drives the RGB output pin. Perhaps do things like log the DMA CNDTR register (this counts down as DMA transfers occur), SPI control/status registers, at suitably interesting times. Look for differences between when the box works, versus when it doesn't.

Cortex M3 doesn't do much reordering, in most cases barriers aren't needed. There's a document on this somewhere... https://documentation-service.arm.com/static/5efefb97dbdee951c1cd5aaf?token=

DMA buffer corruption: Not sure that makes much sense, but perhaps DMA state machine corruption or bad state.

IRQs missed: Print dots in the IRQ handlers, or look for evidence of DMA setup in IRQ handlers via the sort of logging suggested at the start of this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants