Romstage TPM measuring cause a 50 second delay on power on, S3 resume and reboot on all sandy/ivy devices except X230. #541

snmcmillan · 2019-03-21T00:42:05Z

Update: Completely redid this issue. Original text at the end.

Upon powering on, one is greeted with a long black screen. After waiting 50 seconds, they are finally greeted with a menu. After collaboration with @flammit, I've figured out that they delay comes from TPM measurement of the romstage. This delay is also seen on S3 resume and reboot.

This issue affects all sandy/ivy based devices EXCEPT X230.

This also leads me to wonder if romstage actually gets measured, as it seems like a timeout more than it being slow. Although I have not tested my theory yet, I worry that it may not actually be performing the measurements.

Original Issue Below:

Tested on X220 and T420 (#539), both exhibit this issue.

The time it takes to POST is around 1-1.5 minutes long. After that, the display is initialized and then heads loads almost instantly.

Payload isn't the issue, as ThinkLight is able to be toggled on/off anytime after POST, including during the Payload phase, thus isolating it to before the kernel loads.

It's seemingly a set delay as well, which makes me think that something's timing out. Doesn't seem to affect normal coreboot builds on T420 (I haven't run a previous coreboot build on my X220) or X230(t). I will build and flash heads onto my X230t to see if it exhibits the problem, and if it does, this problem is certainly platform independent.

Another bug report (#538) mentions a long pause before giving a beep code, although that build is likely faulty in itself, it still seemingly exhibits the same delay.

On stock firmware, when one runs a modified bios with TPM enabled, it has a long delay (not as long as this delay, if I remember correctly) and also gives the user a beep code indicating that the TPM sees a mismatched signature for the bios. But outside of that and, although incredibly unlikely, IME timing out is the only other thing that comes to my mind.

snmcmillan · 2019-03-21T03:08:49Z

I've done some more testing with my T420, and it takes roughly 50 seconds to wake up from S3 state.
Let me reboot to test power-on time, and I'll edit this message to add the time I get with my stopwatch.
Edit: Getting 50 seconds for the time between the power button lighting up to seeing the bluetooth indicator light up, signifying that POST has finished. This is almost certainly something waiting until it times out.

tlaurion · 2019-03-21T15:12:07Z

@SebastianMcMillan my builds at https://gitlab.com/tlaurion/heads/-/jobs results in a black screen on my test x230t and x230. Testing to see if laptop is starting up after longer delays.

Thought I screwed up something but no. I've done a clean build from fresh git clone of my branch locally on QubesOS fedora-29 based qube without problems?!?

EDIT: heads-ci_fedora-29 is causing the problem. Initrd is xz compressed while the kernel tries to uncompress with gzip.

Sent from my Galaxy S3 using FastHub-Libre

snmcmillan · 2019-03-21T19:15:06Z

I checked cbmem console logs and it seems that it is indeed related to management engine.

Edit: I've attached the on-boot messages as well as cbmem -c | more output.

@tlaurion is your cbmem output similar?

Also, does your repo have this issue as well?

tlaurion · 2019-03-23T23:39:21Z

@SebastianMcMillan nope. I edited precedent comment to not mislead. Had a kernel panic observable through x230-flash image, since that build doesn't ask coreboot to not have GFX unit and actually shows output to the console, where the x230 image just shuts it down until Linux is fully booted inside heads.

Will come back with my own screen captures once I kill the culprit and have a heads-ci that actually builds bootable images.

Sent from my Galaxy S3 using FastHub-Libre

paulmenzel · 2019-03-24T08:42:40Z

On 21.03.19 04:08, Sebastian McMillan wrote: I've done some more testing with my T420, and it takes roughly 50 > seconds to wake up from S3 state. Let me reboot to test power-on time, and I'll edit this message to add the time I get with my stopwatch.

If you can boot to GRUB or an OS, please run `cbmem -t` for time stamps and `cbmem -c` for the console log, and attach those. PS: That’s the beauty of coreboot, you have access to the firmware messages.

snmcmillan · 2019-03-24T15:20:49Z

The logs are attached above in an image file.

Edit: I do not get any timestamps found in the table. I manually configured it to print timestamps to console as well with make coreboot.menuconfig BOARD=t420 to no success.

On my X220, I get that delay right into a payload error (the constant, high pitch beep), that error I'm still trying to sort out, but I'll test on my X230t and see if it will work without delay.

Edit: Issue not present on X230t.

Another Edit: Keep in mind that my T420 is already running coreboot, and it isn't exhibiting these issues when it is running my builds. I am currently building 4.8.1 for the X220 and seeing if it's seeing any of these issues with clean coreboot.

Another Edit 2: X220 doesn't get delay when posting with clean coreboot 4.8.1.

snmcmillan · 2019-03-27T17:21:22Z

I've figured out what is causing this issue. 0030-sandybridge.patch is causing this issue on all Sandy bridge devices from the looks of it. When I compiled heads without said patch, there isn't any delay.

It looks like those changes were made to ensure the TPM was initialized, but it doesn't seem to make any difference.

snmcmillan · 2019-03-27T18:52:54Z

Issue is resolved by #539, which is awaiting merge.

snmcmillan · 2019-03-27T22:11:19Z

OK, so uhhh on second thought, that patch will not do. However, I've isolated it to these lines being the problem:

extern char _romstage, _eromstage;
tlcl_measure(1, &_romstage, &_eromstage - &_romstage);

Now to figure out how on earth this isn't an issue on the X230, but an issue on literally any other sandybridge/ivybridge platform.

snmcmillan · 2019-03-27T23:59:49Z

Instead of calling the pointers of _romstage and _eromstage, calling them not as pointers have resolved the delay. Whether or not this breaks the measurement of romstage is outside the scope of my knowledge.

tlaurion · 2019-03-28T04:44:47Z

I'll test those tomorrow. Have a x220 to test on.

snmcmillan · 2019-03-28T14:21:35Z

Well, I guess something happened and I didn't build with those changes, and when I build with changes, it needs the pointers. Comment them out and it should work, but you do not measure romstage, which would be a problem.

I cannot figure out why on earth this impacts all sandy/ivy boards except X230, as there are not any direct changes to X230 coreboot code that would affect it.

tlaurion · 2019-03-30T03:39:35Z

@SebastianMcMillan : not an option :)

I'll check the changes proposed by @flammit and play around in the next coming days.

snmcmillan · 2019-03-30T14:43:50Z

Now that I think about it, is it actually measuring romstage on non-X230 platforms? If it's a timeout, then it could already be skipping the romstage measurements. If you can debug over serial or look at cbmem, I would add a printk showing a hash of the romstage, and take a look at that on two different romstages.

snmcmillan · 2019-04-01T17:57:53Z

Edited original post to include current information. This problem is high on my priority list.

snmcmillan · 2019-04-09T23:12:51Z

Took a ton of fighting, but we finally got cbmem output.
VERBOSE.log

@flammit mentioning you so you can take a look at this.

edit: increasing CBMEM buffer size as recommended by @tlaurion, will reupload

snmcmillan · 2019-04-09T23:19:09Z

@osresearch you may be able to provide some insight as well.

snmcmillan · 2019-04-10T01:20:29Z

almost-complete-cbmem.log
@tlaurion @flammit @osresearch here is an almost complete log. Buffer size is max.
Edit: Just looked over it and it doesn't have all of romstage... defeating the purpose.

tlaurion · 2019-04-10T02:56:54Z

Beagle bone logs are even less interesting, they start too late.

output.txt
coreboot-x220.config which outputs to beaglebone and cbmem with maximum buffer size

On beaglebone black screwdriver:
fix network config to access it from network , then from ssh:

rmmod g_multi -f && modprobe g_dbgp
cat /dev/ttyGS0 > /tmp/output

tlaurion · 2019-04-10T03:50:48Z

@SebastianMcMillan time to hack MAX_MEM or delete cbmem limitation of 256mb....

Sent from my Galaxy S3 using FastHub-Libre

snmcmillan · 2019-04-10T10:58:38Z

hopefully it's as simple as increasing MAX_MEM from 1<<28 to 1<<56. I don't want to test it during the week as I do need my device for school, but I'll certainly test it out this weekend at the latest.

snmcmillan · 2019-04-10T13:40:38Z

I can increase max size up to 1<<30 at the max. Any higher and it doesn't build.

Update:
It boots with the change, but it doesn't help. I discovered CONFIG_PRERAM_CBMEM_CONSOLE_SIZE and found the default value for that. I am upping that and hoping for the best.

Update 2: I increased to the max value for preram cbmem and it didn't change anything

snmcmillan · 2019-04-10T17:42:56Z

I moved pcr 1 measurements towards the end of romstage, and it does in fact properly measure the romstage.

Now to figure out why it delays.

snmcmillan · 2019-04-11T14:04:58Z

If anyone has a T430 that they are able to flash coreboot with or one that already has coreboot, please build heads from my repo for the T430 (I have t430 and t430-flash configs just like the X230).

If the T430 has this issue, then it'll help narrow down the issue a lot.

tlaurion · 2019-04-11T15:53:30Z

@SebastianMcMillan : look here

Put this into coreboot config:
CONFIG_COLLECT_TIMESTAMPS=y

The issue linked above involved delays resulting from improper IFD. Might be the cause here too. Or not. Just a thought.

tlaurion · 2019-04-11T15:58:10Z

Multiple hypothesis/though exercices coming from the top of my head:

If it was TPM related, the issue wouldn't also be there at S3 resume, since the TPM is not "supposed" to be involved.
The MRC cache could, and yet again, the IFD parsing would occur to try to find the MRC cache, and if not found, retrain memory. There is an option to tackle in coreboot there. Might want to try that.

tlaurion · 2019-04-11T21:10:56Z

@SebastianMcMillan
Confirmed that the ME is the culprit on 68e2c76 with CONFIG_COLLECT_TIMESTAMPS=y set in coreboot config

cbmem -t | sort:


   0:1st timestamp                                     769
   1:start of romstage                                 66,579 (7,378,697,531,575,522)
   2:before ram initialization                         40,870,942 (7,378,697,574,641,644)
   3:after ram initialization                          91,412,517 (5,167,720)
   4:end of romstage                                   91,419,086 (6,568)
   8:starting to load ramstage                         91,430,782 (11,696)
   9:finished loading ramstage                         93,802,660 (93,801,890)
  10:start of ramstage                                 95,709,506 (1,906,846)
  15:starting LZMA decompress (ignore for x86)         91,435,896 (5,113)
  16:finished LZMA decompress (ignore for x86)         93,258,233 (1,822,336)
  30:device enumeration                                95,713,119 (3,612)
  40:device configuration                              95,987,757 (7,378,697,626,090,836)
  50:device enable                                     96,438,903 (96,372,323)
  60:device initialization                             96,526,939 (88,035)
  70:device setup done                                 97,938,300 (1,950,542)
  75:cbmem post                                        97,938,302 (2)
  80:write tables                                      97,938,304 (2)
  85:finalize chips                                    97,974,875 (36,570)
  90:load payload                                      97,974,878 (2)
  99:selfboot jump                                     99,380,742 (58,509,799)
 940:waiting for ME acknowledgement of raminit         86,244,797 (7,378,697,624,333,906)
 941:finished waiting for ME response                  91,394,711 (7,378,697,624,351,593)
21 entries total:
Total Time: 36,893,488,240,676,557

snmcmillan · 2019-04-11T21:15:39Z

So it was as I originally thought. But why would the romstage measuring cause the ME to freak out? I mean, I removed that and it would boot normally.

Nonetheless, I'm verifying this myself and taking a more in-depth look at this.

snmcmillan · 2019-04-11T21:46:25Z

Still doesn't fix it, even after I directly use me_cleaner to shrink me and modify ifd and ifdtool to get the layout.txt

aphick · 2019-05-29T23:43:50Z

Is there a known working commit? I’ve tried a much much older commit that works but comparing it with master is really not useful.

snmcmillan · 2019-05-31T08:17:44Z

I presume this began when heads was ported to coreboot 4.8.1, but I have not been able to test this.

…

On Wed, May 29, 2019 at 6:43 PM stayinside ***@***.***> wrote: Is there a known working commit? I’ve tried a much much older commit that works but comparing it with master is really not useful. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#541>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFNTUNCUP3NKZ5VK54FGCKLPX4ILVANCNFSM4HACEFJA> .

csritu · 2019-06-01T04:56:36Z

Is there anything I can do to help with this? I have no familiarity with the code of either heads or coreboot, but I have an i5 x220 that I can test on, and I've disabled the measurements and gotten it to boot without the delay.

snmcmillan · 2019-06-06T01:52:11Z

@csritu I'll gladly have another tester if you're up for it, I need others that can do some testing. As long as you're able to do troubleshooting or are willing to learn. Please contact me on the u-root slack, I can discuss this further their in private.

snmcmillan · 2020-02-19T05:39:42Z

This issue is now solved in #578. Will close once changes make there way into master.

snmcmillan · 2020-02-19T19:35:41Z

Fixes merged, closing.

tlaurion · 2020-05-13T14:37:34Z

Just as a closing note for this ticket, the problem was not TPM, but CBFS size in coreboot configuration of the boards in question. The IFD needed to be modified so that CBFS size in coreboot matched the IFD produced. I think the most relevant ticket, outside of PR associated with x220 support, would be #307.

Coreboot got stucked, in an undebuggable state per USB debug dongles, where coreboot finally left its uncertainty 50 seconds later to continue even though IFD and CBFS regions didnt matched.

@SebastianMcMillan : if you want to complete this closing statement, that may be useful in the future.

snmcmillan · 2020-05-15T14:22:02Z

@tlaurion you've pretty much hit all the points regarding this issue.

snmcmillan mentioned this issue Mar 21, 2019

X220 gives endless beep error code. #538

Closed

snmcmillan changed the title ~~Incredibly slow POST times.~~ Incredibly slow POST times and S3 Resume times. Mar 21, 2019

snmcmillan mentioned this issue Mar 27, 2019

Add support for T420 and T430, enable FBWhiptail for X220. #539

Closed

tlaurion self-assigned this Mar 28, 2019

snmcmillan mentioned this issue Apr 1, 2019

x220 port not working as it is? #543

Closed

snmcmillan changed the title ~~Incredibly slow POST times and S3 Resume times.~~ Romstage TPM measuring cause a 50 second delay on power on, S3 resume and reboot on all sandy/ivy devices except X230. Apr 1, 2019

tlaurion mentioned this issue Apr 9, 2019

X220 cbmem temp debug requirements snmcmillan/heads#2

Merged

snmcmillan mentioned this issue Jun 7, 2019

T430 initial support #580

Closed

tlaurion mentioned this issue Oct 14, 2019

Heads/Coreboot specific work # #605

Closed

7 tasks

snmcmillan closed this as completed Feb 19, 2020

tlaurion mentioned this issue Mar 17, 2020

LibremKey/Nitrokey with x220 #690

Open

tlaurion mentioned this issue May 13, 2020

Is TPM or Intel hardware required for heads to work? #717

Closed

tlaurion mentioned this issue Jan 17, 2021

xxx0 maximized (t400 x200) #953

Closed

2 tasks

Romstage TPM measuring cause a 50 second delay on power on, S3 resume and reboot on all sandy/ivy devices except X230. #541

Romstage TPM measuring cause a 50 second delay on power on, S3 resume and reboot on all sandy/ivy devices except X230. #541

Comments

snmcmillan commented Mar 21, 2019 • edited Loading

snmcmillan commented Mar 21, 2019 • edited Loading

tlaurion commented Mar 21, 2019 • edited Loading

snmcmillan commented Mar 21, 2019 • edited Loading

tlaurion commented Mar 23, 2019 • edited Loading

paulmenzel commented Mar 24, 2019 via email

snmcmillan commented Mar 24, 2019 • edited Loading

snmcmillan commented Mar 27, 2019 • edited Loading

snmcmillan commented Mar 27, 2019

snmcmillan commented Mar 27, 2019

snmcmillan commented Mar 27, 2019

tlaurion commented Mar 28, 2019

snmcmillan commented Mar 28, 2019

tlaurion commented Mar 30, 2019

snmcmillan commented Mar 30, 2019

snmcmillan commented Apr 1, 2019

snmcmillan commented Apr 9, 2019 • edited Loading

snmcmillan commented Apr 9, 2019

snmcmillan commented Apr 10, 2019 • edited Loading

tlaurion commented Apr 10, 2019 • edited Loading

tlaurion commented Apr 10, 2019

snmcmillan commented Apr 10, 2019

snmcmillan commented Apr 10, 2019 • edited Loading

snmcmillan commented Apr 10, 2019 • edited Loading

snmcmillan commented Apr 11, 2019

tlaurion commented Apr 11, 2019

tlaurion commented Apr 11, 2019 • edited Loading

tlaurion commented Apr 11, 2019

snmcmillan commented Apr 11, 2019

snmcmillan commented Apr 11, 2019

aphick commented May 29, 2019

snmcmillan commented May 31, 2019 via email

csritu commented Jun 1, 2019

snmcmillan commented Jun 6, 2019

snmcmillan commented Feb 19, 2020

snmcmillan commented Feb 19, 2020

tlaurion commented May 13, 2020

snmcmillan commented May 15, 2020

snmcmillan commented Mar 21, 2019 •

edited

Loading

snmcmillan commented Mar 21, 2019 •

edited

Loading

tlaurion commented Mar 21, 2019 •

edited

Loading

snmcmillan commented Mar 21, 2019 •

edited

Loading

tlaurion commented Mar 23, 2019 •

edited

Loading

snmcmillan commented Mar 24, 2019 •

edited

Loading

snmcmillan commented Mar 27, 2019 •

edited

Loading

snmcmillan commented Apr 9, 2019 •

edited

Loading

snmcmillan commented Apr 10, 2019 •

edited

Loading

tlaurion commented Apr 10, 2019 •

edited

Loading

snmcmillan commented Apr 10, 2019 •

edited

Loading

snmcmillan commented Apr 10, 2019 •

edited

Loading

tlaurion commented Apr 11, 2019 •

edited

Loading