Skip to content

Releases: open-power/skiboot

skiboot-5.5.0-rc2

25 May 00:58
skiboot-5.5.0-rc2
Compare
Choose a tag to compare
skiboot-5.5.0-rc2 Pre-release
Pre-release

skiboot-5.5.0-rc2 was released on Monday April 3rd 2017. It is the second
release candidate of skiboot 5.5, which will become the new stable release
of skiboot following the 5.4 release, first released November 11th 2016.

skiboot-5.5.0-rc2 contains all bug fixes as of :ref:skiboot-5.4.3
and :ref:skiboot-5.1.19 (the currently maintained stable releases).

For how the skiboot stable releases work, see :ref:stable-rules for details.

The current plan is to cut the final 5.5.0 by April 8th, with skiboot 5.5.0
being for all POWER8 and POWER9 platforms in op-build v1.16 (Due April 12th).
This is a short cycle as this release is mainly targetted towards POWER9
bringup efforts.

Following skiboot-5.5.0, we will move to a regular six week release cycle,
similar to op-build, but slightly offset to allow for a short stabilisation
period. Expected release dates and contents are tracked using GitHub milestone
and issues: https://github.com/open-power/skiboot/milestones

Over :ref:skiboot-5.5.0-rc1, we have the following changes:

NVLINK2

  • Introduce NPU2 support

    NVLink2 is a new feature introduced on POWER9 systems. It is an
    evolution of of the NVLink1 feature included in POWER8+ systems but
    adds several new features including support for GPU address
    translation using the Nest MMU and cache coherence.

    Similar to NVLink1 the functionality is exposed to the OS as a series
    of virtual PCIe devices. However the actual hardware interfaces are
    significantly different which limits the amount of common code that
    can be shared between implementations in the firmware.

    This patch adds basic hardware initialisation and exposure of the
    virtual NVLink2 PCIe devices to the running OS.

  • npu2: Add OPAL calls for nvlink2 address translation services (see :ref:OPAL_NPU2)

    Adds three OPAL calls for interacting with NPU2 devices:
    :ref:OPAL_NPU_INIT_CONTEXT, :ref:OPAL_NPU_DESTROY_CONTEXT and
    :ref:OPAL_NPU_MAP_LPAR.

    These are used to setup and configure address translation services
    (ATS) for a process/partition on a given NVLink2 device.

POWER9

  • hdata/memory: ignore homer and occ reserved ranges

    We populate these from the HOMER BARs in the PBA directly. There's no
    need to take the hostboot supplied values so just ignore the
    corresponding reserved ranges.

  • hdata/vpd: Parse the OpenPOWER OPFR record

    Parse the OpenPOWER FRU VPD (OPFR) record on OpenPOWER instead
    of the VINI records.

  • hdata/vpd: Parse additional VINI records

    These records provide hardware version details, CCIN extension information,
    card type details and hardware characteristics of the FRU

  • hdata/cpu: account for p9 shared caches

    On P9 the L2 and L3 caches are shared between pairs of SMT=4 cores.
    Currently this is not accounted for when creating caches nodes in
    the device tree. This patch adds additional checking so that a
    cache node is only created for the first core in the pair and
    the second core will reference the cache correctly.

  • hdata: print backtraces on HDAT errors

  • hdat: ignore zero length reserves

    Hostboot can export reserved regions with a length of zero and these
    should be ignored rather than being turned into reserved range. While
    we're here fix a memory leak by moving the "too large" region check
    to before we allocate space for the label.

  • SLW: Add init for power9 power management

    This patch adds new function to init core for power9 power management.
    SPECIAL_WKUP_* SCOM registers, if set, can hold the cores from going into
    idle states. Hence, clear PPM_SPECIAL_WKUP_HYP_REG scom register for each
    core during init. (This init are not required for MAMBO)

PCI

  • hw/phb3: Adjust ECRC on root port dynamically

    The Samsung NVMe adapter is lost when it's connected to PMC 8546 PCIe
    switch, until ECRC is disabled on the root port. We found similar issue
    prevously when Broadcom adapter is connected to same part of PCIe switch
    and it was fixed by commit 60ce59c ("hw/phb3: Disable ECRC on Broadcom
    adapter behind PMC switch"). Unfortunately, the commit doesn't fix
    the Samsung NVMe adapter lost issue.

    This fixes the issues by disable ECRC generation/check on root port
    when PMC 8546 PCIe switch ports are found. This can be extended for
    other PCIe switches or endpoints in future: Each PHB maintains the
    count of PCI devices (PMC 8546 PCIe switch ports currently) which
    require to disable ECRC on root port. The ECRC functionality is
    enabled when first PMC 8546 switch port is probed and disabled when
    last PMC 8546 switch port is destroyed (in PCI hot remove scenario).
    Except PHB's reinitialization after complete reset, the ECRC on
    root port is untouched.

  • core/pci: Fix lost NVMe adapter behind PMC 8546 switch

    The NVMe adapter in below PCI topology is lost. The root cause is
    the presence bit on its PCI slot is missed, but the PCIe link has
    been up. The PCI core doesn't probe the adapter behind the slot,
    leading to lost NVMe adapter in the particular case.

    • PHB3 root port
    • PLX switch 8748 (10b5:8748)
    • PLX swich 9733 (10b5:9733)
    • PMC 8546 swtich (11f8:8546)
    • NVMe adapter (1c58:0023)

    This fixes the issue by overriding the PCI slot presence bit with
    PCIe link state bit.

  • hw/phb4: Locate AER capability position if necessary

  • core/pci: Disable surprise hotplug on root port

  • core/pci: Ignore PCI slot capability on root port

    We are creating PCI slot on root port, where the PCI slot isn't
    supported from hardware. For this case, we shouldn't read the PCI
    slot capability from hardware. When bogus data returned from the
    hardware, we will attempt to the PCI slot's power state or enable
    surprise hotplug functionality. All of them can't be accomplished
    without hardware support.

    This leaves the PCI slot's capability list 0 if PCICAP_EXP_CAP_SLOT
    isn't set in hardware (pcie_cap + 0x2). Otherwise, the PCI slot's
    capability list is retrieved from hardware (pcie_cap + 0x14).

  • phb4: Default to PCIe GEN2 on DD1

    Default to PCIe GEN2 link speeds on DD1 for stability.

    Can be overridden using nvram pcie-max-link-speed=4 parameter.

  • phb3/4: Set max link speed via nvram

    This adds an nvram parameter pcie-max-link-speed to configure the max
    speed of the pcie link. This can be set from the petitboot prompt
    using: ::

    nvram -p ibm,skiboot --update-config pcie-max-link-speed=4

    This takes preference over anything set in the device tree and is
    global to all PHBs.

Tests

  • Mambo/Qemu boot tests: expect (and fail) on checkstop

    This allows us to fail a lot faster if we checkstop

skiboot-5.5.0-rc1

25 May 00:58
skiboot-5.5.0-rc1
Compare
Choose a tag to compare
skiboot-5.5.0-rc1 Pre-release
Pre-release

skiboot-5.5.0-rc1 was released on Tuesday March 28th 2017. It is the first
release candidate of skiboot 5.5, which will become the new stable release
of skiboot following the 5.4 release, first released November 11th 2016.

skiboot-5.5.0-rc1 contains all bug fixes as of :ref:skiboot-5.4.3
and :ref:skiboot-5.1.19 (the currently maintained stable releases).

For how the skiboot stable releases work, see :ref:stable-rules for details.

The current plan is to cut the final 5.5.0 by April 8th, with skiboot 5.5.0
being for all POWER8 and POWER9 platforms in op-build v1.16 (Due April 12th).
This is a short cycle as this release is mainly targetted towards POWER9
bringup efforts.

Following skiboot-5.5.0, we will move to a regular six week release cycle,
similar to op-build, but slightly offset to allow for a short stabilisation
period. Expected release dates and contents are tracked using GitHub milestone
and issues: https://github.com/open-power/skiboot/milestones

Over skiboot-5.4, we have the following changes:

New Platforms

  • SuperMicro's (SMC) P8DNU: An astbmc based POWER8 platform

  • Add a generic platform to help with bringup of new systems.

  • Four POWER9 based systems (NOTE: All POWER9 systems should be considered
    for bringup use only at this point):

    • Romulus
    • Witherspoon (a POWER9 system with NVLink2 attached GPUs)
    • Zaius (OpenCompute platform, also known as "Barreleye 2")
    • ZZ (FSP based system)

New features

  • System reset IPI facility and Mambo implementation
    Add an opal call :ref:OPAL_SIGNAL_SYSTEM_RESET which allows system reset
    exceptions to be raised on other CPUs and act as an NMI IPI. There
    is an initial simple Mambo implementation, but allowances are made
    for a more complex hardware implementation.

    The Mambo implementation is based on the RFC implementation for POWER8
    hardware (see https://patchwork.ozlabs.org/patch/694794/) which we hope
    makes it into a future release.

    This implements an in-band NMI equivalent.

  • add CONTRIBUTING.md, ensuring that people new to the project have a one-stop
    place to find out how to get started.

  • interrupts: Add optional name for OPAL interrupts

    This adds the infrastructure for an interrupt source to provide
    a name for an interrupt directed toward OPAL. Those names will
    be put into an "opal-interrupts-names" property which is a
    standard DT string list corresponding 1:1 with the "opal-interrupts"
    property. PSI interrupts get names, and this is visible in Linux
    through /proc/interrupts

  • platform: add OPAL_REBOOT_FULL_IPL reboot type

    There may be circumstances in which a user wants to force a full IPL reboot
    rather than using fast reboot. Add a new reboot type, OPAL_REBOOT_FULL_IPL,
    that disables fast reboot. On platforms which don't support fast reboot,
    this will be equivalent to a normal reboot.

  • phb3: Trick to allow control of the PCIe link width and speed

    This implements a hook inside OPAL that catches 16 and 32 bit writes
    to the link status register of the PHB.

    It allows you to write a new speed or a new width, and OPAL will then
    cause the PHB to renegociate.

    Example:

    First read the link status on PHB4: ::

    setpci -s 0004:00:00.0 0x5a.w
    a103
    

    It's at x16 Gen3 speed (8GT/s)

    bits 0x0ff0 are the width and 0x000f the speed. The width can be
    1 to 16 and the speed 1 to 3 (2.5, 5 and 8GT/s)

    Then try to bring it down to 1x Gen1 : ::

    setpci -s 0004:00:00.0 0x5a.w=0xa011
    

    Observe the result in the PHB: ::

    / # lspci -s 0004:00:00.0 -vv
    0004:00:00.0 PCI bridge: IBM Device 03dc (prog-if 00 [Normal decode])
    .../...
    LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt+
    

    And in the device: ::

    / # lspci -s 0004:01:00.0 -vv
    .../...
    LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
    
  • core/init: Add hdat-map property to OPAL node.

    Exports the HDAT heap to the OS. This allows the OS to view the HDAT heap
    directly. This allows us to view the HDAT area without having to use
    getmemproc.

  • Add a generic platform: If /bmc in device tree, attempt to init one
    For the most part, this gets us somewhere on some OpenPOWER systems
    before there's a platform file for that machine.

    Useful in bringup only, and marked as such with scary looking log
    messages.

Core

  • asm: Don't try to set LPCR:LPES1 on P8 and P9, the bit doesn't exist.

  • pci: Add a framework for quirks

    In future we may want to be able to do fixups for specific PCI devices in
    skiboot, so add a small framework for doing this.

    This is not intended for the same purposes as quirks in the Linux kernel,
    as the PCI devices that quirks can match for in skiboot are not properly
    configured. This is intended to enable having a custom path to make
    changes that don't directly interact with the PCI device, for example
    adding device tree entries.

  • hw/slw: fix possible NULL dereference

  • slw: Print enabled stop states on boot

  • uart: Fix Linux pass-through policy, provide NVRAM override option

  • libc/stdio/vsnprintf.c: add explicit fallthrough, this silences a recent
    (GCC 7.x) warning

  • init: print the FDT blob size in decimal

  • init: Print some more info before booting linux

    The kernel command line from nvram and the stdout-path are
    useful to know when debugging console related problems.

  • Makefile: Disable stack protector due to gcc problems

    Depending on how it was built, gcc will use the canary from a global
    (works for us) or from the TLS (doesn't work for us and accesses
    random stuff instead).

    Fixing that would be tricky. There are talks of adding a gcc option
    to force use of globals, but in the meantime, disable the stack
    protector.

  • Stop using 3-operand cmp[l][i] for latest binutils
    Since a5721ba270, binutils does not support 3-operand cmp[l][i].
    This adds (previously optional) parameter L.

  • buddy: Add a simple generic buddy allocator

  • stack: Don't recurse into __stack_chk_fail

  • Makefile: Use -ffixed-r13
    We use r13 for our own stuff, make sure it's properly fixed

  • Always set ibm,occ-functional-state correctly

  • psi: fix the xive registers initialization on P8, which seems to be fine
    for real HW but causes a lof of pain under qemu

  • slw: Set PSSCR value for idle states

  • Limit number of "Poller recursion detected" errors to display

    In some error conditions, we could spiral out of control on this
    and spend all of our time printing the exact same backtrace.

    Limit it to 16 times, because 16 is a nice number.

  • slw: do SLW timer testing while holding xscom lock

    We add some routines that let a caller get the xscom lock once and
    then do a bunch of xscoms while holding it.
    In some situations without this, it could take long enough to get
    the xscom lock that the 1ms timeout would expire and we'd falsely
    think the SLW timer didn't work when in fact it did.

  • wait_for_resource_loaded: don't needlessly sleep for 5ms

  • run pollers in cpu_process_local_jobs() if running job synchonously

  • fsp: Don't recurse pollers in ibm_fsp_terminate

  • chiptod: More hardening against -1 chip ID

  • interrupts: Rewrite/correct doc for opal_set/get_xive

  • cpu: Don't enable nap mode/PM mode on non-P8

  • platform: Call generic platform probe and init UART there

  • psi: Don't register more interrupts than the HW supports

  • psi: Add DT option to disable LPC interrupts

I2C and TPM

  • p8i2c: Use calculated poll_interval when booting OPAL
    Otherwise we'd default to 2seconds (TIMER_POLL) during boot on
    chips with a functional i2c interrupt, leading to slow i2c
    during boot (or hitting timeouts instead).

  • i2c: Add i2c_run_req() to crank the state machine for a request

  • tpm_i2c_nuvoton: work out the polling time using mftb()

  • tpm_i2c_nuvoton: handle errors after reading the tpm fifo

  • tpm_i2c_nuvoton: cleanup variables in tpm_read_fifo()

  • tpm_i2c_nuvoton: handle errors after writting the tpm fifo

  • tpm_i2c_nuvoton: cleanup variables in tpm_write_fifo()

  • tpm_i2c_nuvoton: handle errors after writing sts.commandReady in step 5

  • tpm_i2c_nuvoton: handle errors after writing sts.go

  • tpm_i2c_nuvoton: handle errors after checking the tpm fifo status

  • tpm_i2c_nuvoton: return burst_count in tpm_read_burst_count()

  • tpm_i2c_nuvoton: isolate the code that handles the TPM_TIMEOUT_D timeout

  • tpm_i2c_nuvoton: handle errors after reading sts.commandReady

  • tpm_i2c_nuvoton: add tpm_status_read_byte()

  • tpm_i2c_nuvoton: add tpm_check_status()

  • tpm_i2c_nuvoton: rename defines to shorter names

  • tpm_i2c_interface: decouple rc from being done with i2c request

  • tpm_i2c_interface: set timeout before each request

  • i2c: Add nuvoton quirk, disallowing i2cdetect as it locks TPM

    p8-i2c reset things manually in some error conditions

  • stb: create-container and wrap skiboot in Secure/Trusted Boot container

    We produce UNSIGNED skiboot.lid.stb and skiboot.lid.xz.stb as build
    artifacts.

    These are suitable blobs for flashing onto Trusted Boot enabled op-build
    builds WITH the secure boot jumpers ON (i.e. NOT in secure mode).
    It's just enough of the Secure and Trusted Boot container format to
    make Hostboot behave.

PCI

  • core/pci: Support SRIOV VFs

    Currently, skiboot can't see SRIOV VFs. It introduces some troubles
    as I can see: The device initialization logic (phb->ops->device_init())
    isn't applied to VFs, meaning we have to maintain same and duplicated
    mechanism in kernel for VFs only. It introduces difficulty to code
    maintaining and prone to lose sychronization.

    This was motivated by bug reported by Carol: The VF's Max Payload
    Size (MPS) isn't matched with PF's...

Read more

skiboot-5.4.3

02 Feb 07:33
skiboot-5.4.3
Compare
Choose a tag to compare

skiboot-5.4.3 was released on Monday January 16th, 2017. It replaces
:ref:skiboot-5.4.2 as the current stable release.

Over :ref:skiboot-5.4.2, we have a small number of bug fixes:

  • Makefile: Disable stack protector due to gcc problems
  • Makefile: Use -ffixed-r13.
    We use r13 for our own stuff, make sure it's properly fixed
  • phb3: Lock the PHB on set_xive callbacks
  • arch_flash_arm: Don't assume mtd labels are short
  • Stop using 3-operand cmp[l][i] for latest binutils
  • hw/phb3: fix error handling in complete reset

skiboot-5.1.19

02 Feb 07:32
skiboot-5.1.19
Compare
Choose a tag to compare

skiboot-5.1.19

skiboot-5.1.19 was released on Monday 16th January 2017.

skiboot-5.1.19 is the 20th stable release of 5.1, it follows skiboot-5.1.18
(which was released 26th August 2016).

This release contains a few minor bug fixes.

Changes are:

Generic:

  • Makefile: Disable stack protector due to gcc problems
  • stack: Don't recurse into __stack_chk_fail
  • Makefile: Use -ffixed-r13
    We did not find evidence of this ever being a problem, but this fix
    is good and preventative.
  • Limit number of "Poller recursion detected" errors to display
    In some error conditions, we could spiral out of control on this
    and spend all of our time printing the exact same backtrace.
    Limit it to 16 times, because 16 is a nice number.

FSP based Systems:

  • fsp: Don't recurse pollers in ibm_fsp_terminate
    If we were to terminate in a poller, we'd call op_display() which
    called pollers which hit the recursive poller warning, which ended
    in not much fun at all.

PCI:

  • hw/phb3: set PHB retry state correctly when fresetting during a creset
  • phb3: Lock the PHB on set_xive callbacks
    Those are called by the interrupts core and thus skip the locking
    implicit in the PCI opal calls.
  • hw/{phb3, p7ioc}: Return success for freset on empty PHB
    OPAL_CLOSED is returned when fundamental reset is issued on the
    PHB who doesn't have subordinate devices (root port excluded).
    The kernel raises an error message, which is unnecessary. This
    returns OPAL_SUCCESS for this case to avoid the error message.
  • hw/phb3: fix error handling in complete reset
    During a complete reset, when we get a timeout waiting for pending
    transaction in state PHB3_STATE_CRESET_WAIT_CQ, we mark the PHB as broken
    and return OPAL_PARAMETER.
    Change the return code to OPAL_HARDWARE which is way more sensible, and set
    the state to PHB3_STATE_FENCED so that the kernel can retry the complete
    reset.

skiboot-5.4.2

02 Feb 07:32
skiboot-5.4.2
Compare
Choose a tag to compare

skiboot-5.4.2 was released on Friday December 2nd 2016. It replaces
:ref:skiboot-5.4.1 as the current stable release.

Over :ref:skiboot-5.4.1, we have two bug fixes exclusively aimed at machines
with TPMs:

  • i2c: Add nuvoton TPM quirk, disallowing i2cdetect as it can hard lock the TPM
  • p8-i2c improve I2C reset code path, solves getting stuck resetting i2c engine

skiboot 5.4.1

29 Nov 07:57
skiboot-5.4.1
Compare
Choose a tag to compare

skiboot-5.4.1 was released on Tuesday November 29th 2016. It replaces
skiboot-5.4.0 as the current stable release.

Over skiboot-5.4.0, we have a few changes:

  • Nuvoton i2c TPM driver: bug fixes and improvements, especially around
    timeouts and error handling.
  • Limit number of "Poller recursion detected" errors to display.
    In some error conditions, we could spiral out of control on this
    and spend all of our time printing the exact same backtrace.
  • slw: do SLW timer testing while holding xscom lock.
    In some situations without this, it could take long enough to get
    the xscom lock that the 1ms timeout would expire and we'd falsely
    think the SLW timer didn't work when in fact it did.
  • p8i2c: Use calculated poll_interval when booting OPAL.
    Otherwise we'd default to 2seconds (TIMER_POLL) during boot on
    chips with a functional i2c interrupt, leading to slow i2c
    during boot (or hitting timeouts instead).
  • i2c: More efficiently run TPM I2C operations during boot, avoiding hitting
    timeouts
  • fsp: Don't recurse pollers in ibm_fsp_terminate

skiboot 5.4.0

11 Nov 05:47
skiboot-5.4.0
Compare
Choose a tag to compare

skiboot-5.4.0 was released on Friday November 11th 2016. It is the new stable
skiboot release, taking over from the 5.3.x series (first released August 2nd,
2016). It comes after four release candidates, which have helped to shake out
a few issues.

skiboot-5.4.0 contains all bug fixes as of :ref:skiboot-5.3.7
and :ref:skiboot-5.1.18 (the currently maintained stable releases).

Skiboot 5.4.x becomes the new stable release. For how the skiboot stable
releases work, see :ref:stable-rules for details.

Over :ref:skiboot-5.4.0-rc4, we have a few changes:

  • libstb: bump up the byte timeout for tpm i2c requests

    This bumps up the byte timeout for tpm i2c requests from 10ms to 30ms.
    Some p8dtu systems are getting i2c request timeout.

  • external/pflash: Perform the correct cleanup when -F is used to operate on
    a file.

  • Add SuperMicro p8dtu1u and p8dtu2u platforms

  • Revert "core/ipmi: Set interrupt-parent property".
    This reverts commit d997e48 (introduced
    in 5.4.0-rc1)

    A problem was found with pre 4.2 linux kernels where a spurious WARNING
    would be emitted. This change doesn't matter enough to scare users
    so we can just revert it. ::

      Warning was:
      [    0.947741] irq: irq-62==>hwirq-0x3e mapping failed: -22
      [    0.947793] ------------[ cut here ]------------
      [    0.947838] WARNING: at kernel/irq/irqdomain.c:485
    
  • libflash/libffs: Fix possible NULL dereference

Previous Release Candidates

There were four release candidates for skiboot 5.4.0:

  • :ref:skiboot-5.4.0-rc4
  • :ref:skiboot-5.4.0-rc3
  • :ref:skiboot-5.4.0-rc2
  • :ref:skiboot-5.4.0-rc1

Changes since skiboot 5.3

Over skiboot-5.3, we have the following changes:

New Features

  • Add SuperMicro p8dtu1u and p8dtu2u platforms

  • Initial Trusted Boot support (see :ref:stb-overview).
    There are several limitations with this initial release:

    • Only Nuvoton TPM 2.0 is supported
    • Requires hardware rework on late revision Habanero or Firestone boards
      in order to install TPM.
    • Add i2c Nuvoton TPM 2.0 Driver
    • romcode driver for POWER8 secure ROM
    • See Device tree docs: :ref:device-tree/tpm and :ref:device-tree/ibm,secureboot
    • See :ref:stb-overview
  • Support ibm,skiboot NVRAM partition with skiboot configuration options.

    • These should generally only be used if you either completely know what
      you are doing or need to work around a skiboot bug. They are not
      intended for end users and are explicitly NOT ABI.
    • Add support for supplying the kernel boot arguments from the bootargs
      configuration string in the ibm,skiboot NVRAM partition.
    • Enabling the experimental fast reset feature is done via this method.
  • Add support for nap mode on P8 while in skiboot

    • While nap has been exposed to the Operating System since day 1, we have
      not utilized low power states when in skiboot itself, leading to higher
      power consumption during boot.
      We only enable the functionality after the 0x100 vector has been
      patched, and we disable it before transferring control to Linux.
  • libflash: add 128MB MX66L1G45G part

  • Pointer validation of OPAL API call arguments.

    • If the kernel called an OPAL API with vmalloc'd address
      or any other address range in real mode, we would hit
      a problem with aliasing. Since the top 4 bits are ignored
      in real mode, pointers from 0xc.. and 0xd.. (and other ranges)
      could collide and lead to hard to solve bugs. This patch
      adds the infrastructure for pointer validation and a simple
      test case for testing the API
    • The checks validate pointers sent in using opal_addr_valid()
  • Fast reboot for P8

    This makes reboot take an awful lot less time, somewhere between four
    and ten times faster than a full IPL. It is currently experimental and not
    enabled by default.
    You can enable the experimental support via nvram option: ::

    nvram -p ibm,skiboot --update-config experimental-fast-reset=feeling-lucky

    WARNING: While we think we've managed to work out or around most of
    the kinks with fast-reset, we are not enabling it by default in 5.4.

    Notably, fast reset will not happen in the following scenarios:

    • platform error

      Most of the time, if we're rebooting due to a platform error, we should
      trigger a checkstop. However, if we haven't been told what we should do
      to trigger a checkstop (e.g. on an FSP machine), then we should still
      fail to fast-reboot.

      So, fast-reboot is disabled in the OPAL_CEC_REBOOT2 code path
      for the OPAL_REBOOT_PLATFORM_ERROR reboot type.

    • FSP code update

    • Unrecoverable HMI

    • A PHB is in CAPI mode

      If a PHB is in CAPI mode, we cannot safely fast reboot - the PHB will be
      fenced during the reboot resulting in major problems when we load the new
      kernel.

      In order to handle this safely, we need to disable CAPI mode before
      resetting PHBs during the fast reboot. However, we don't currently support
      this.

      In the meantime, when fast rebooting, check if there are any PHBs with a
      CAPP attached, and if so, abort the fast reboot and revert to a normal
      reboot instead.

Documentation

There have been a number of documentation fixes this release. Most prominent
is the switch to Sphinx (from the Python project) and ReStructured Text (RST)
as the documentation format. RST and Sphinx enable both production of pretty
documentation in HTML and PDF formats while remaining readable in their raw
form to those with no knowledge of RST.

You can build a HTML site by doing the following: ::

cd doc/
make html

As always, documentation patches are very, very welcome as we attempt to
document the OPAL API, the device tree bindings and important parts of
OPAL internals.

We would like the Device Tree documentation to follow the style that can be
included in the Device Tree Specification.

General

  • Make console-log time more readable: seconds rather than timebase
    Log format is now [SECONDS.(tb%512000000),LEVEL]

  • Flash (PNOR) code improvements

    • flash: Make size 64 bit safe
      This makes the size of flash 64 bit safe so that we can have flash
      devices greater than 4GB. This is especially useful for mambo disks
      passed through to Linux.
    • core/flash.c: load actual partition size
      We are downloading 0x20000 bytes from PNOR for CAPP, but currently the
      CAPP lid is only 40K.
    • flash: Rework error paths and messages for multiple flash controllers
      Now that we have mambo bogusdisk flash, we can have many flash chips.
      This is resulting in some confusing output messages.
  • core/init: Fix "failure of getting node in the free list" warning on boot.

  • slw: improve error message for SLW timer stuck

  • Centaur / XSCOM error handling

    • print message on disabling xscoms to centaur due to many errors
    • Mark centaur offline after 10 consecutive access errors
  • XSCOM improvements

    • xscom: Map all HMER status codes to OPAL errors
    • xscom: Initialize the data to a known value in xscom_read
      In case of error, don't leave the data random. It helps debugging when
      the user fails to check the error code. This happens due to a bug in the
      PRD wrapper app.
    • chip: Add a quirk for when core direct control XSCOMs are missing
  • p8-i2c: Don't crash if a centaur errored out

  • cpu: Make endian switch message more informative

  • cpu: Display number of started CPUs during boot

  • core/init: ensure that HRMOR is zero at boot

  • asm: Fix backtrace for unexpected exception

  • cpu: Remove pollers calling heuristics from cpu_wait_job
    This will be handled by time_wait_ms(). Also remove a useless
    smt_medium().
    Note that this introduce a difference in behaviour: time_wait
    will only call the pollers on the boot CPU while cpu_wait_job()
    could call them on any. However, I can't think of a case where
    this is a problem.

  • cpu: Remove global job queue
    Instead, target a specific CPU for a global job at queuing time.
    This will allow us to wake up the target using an interrupt when
    implementing nap mode.
    The algorithm used is to look for idle primary threads first, then
    idle secondaries, and finally the less loaded thread. If nothing can
    be found, we fallback to a synchronous call.

  • lpc: Log LPC SYNC errors as unrecoverable ones for manufacturing

  • lpc: Optimize SerIRQ dispatch based on which PSI IRQ fired

  • interrupts: Add new source ->attributes() callback
    This allows a given source to provide per-interrupt attributes
    such as whether it targets OPAL or Linux and it's estimated
    frequency.

    The former allows to get rid of the double set of ops used to
    decide which interrupts go where on some modules like the PHBs
    and the latter will be eventually used to implement smart
    caching of the source lookups.

  • opal/hmi: Fix a TOD HMI failure during a race condition.

  • platform: Add BT to Generic platform

NVRAM

  • Support ibm,skiboot partition for skiboot specific configuration options
  • flash: Size NVRAM based on ECC for OpenPOWER platforms
    If NVRAM has ECC (as per the ffs header) then the actual size of the
    partition is less than reported by the ffs header in the PNOR then the
    actual size of the partition is less than reported by the ffs header.

NVLink/NPU

  • Fix reserved PE#
  • NPU bdfn allocation bugfix
  • Fix bad PE number check
    NPUs have 4 PEs which are zero indexed, so {0, 1, 2, 3}. A bad PE number
    check in npu_err_inject checks if the PE number is greater than 4 as a
    fail case, so it would wrongly perform operations on a non-existant PE 4.
  • Use PCI virtual device
  • assert the NPU irq min is aligned.
  • program NPU BUID reg properly
  • npu: reword "error" to indicate it's actually a warning
    Incorrect FWTS annotation.
    Without this patch, you get spurious FirmWare Test Suite (FWTS) warnings
    about NVLink not working on machines that aren't fully populated with
    GPUs.
  • external: NP...
Read more

skiboot-5.4.0-rc4

08 Nov 06:57
skiboot-5.4.0-rc4
Compare
Choose a tag to compare
skiboot-5.4.0-rc4 Pre-release
Pre-release

skiboot-5.4.0-rc4 was released on Tuesday November 8th 2016. It is the
fourth (and hopefully final) release candidate of skiboot 5.4, which will
become the new stable release of skiboot following the 5.3 release, first
released August 2nd 2016.

skiboot-5.4.0-rc4 contains all bug fixes as of skiboot-5.3.7
and skiboot-5.1.18 (the currently maintained stable releases).

For how the skiboot stable releases work, see https://github.com/open-power/skiboot/blob/skiboot-5.4.0-rc4/doc/stable-skiboot-rules.rst for details.

Since this is a release candidate, it should NOT be put into production.

With this release candidate, I'm hoping that it's the last one, and that within
the week we're able to tag a final 5.4.0 release. There is one bit of code I'm
hoping to merge in before the final 5.4.0, and that's the p8dtu platform
definition. The aim is for skiboot-5.4.x to be in op-build v1.13, which is due
by November 23rd 2016.

Over skiboot-5.4.0-rc3, we have a few changes:

  • Add BMC platform to enable correct OEM IPMI commands

    An out of tree platform (p8dtu) uses a different IPMI OEM command
    for IPMI_PARTIAL_ADD_ESEL. This exposed some assumptions about the BMC
    implementation in our core code.

    Now, with platform.bmc, each platform can dictate (or detect) the BMC
    that is present. We allow it to be set at runtime rather than purely
    statically in struct platform as it's possible to have differing BMC
    implementations on the one machine (e.g. AMI BMC or OpenBMC).

  • hw/ipmi-sensor: Fix setting of firmware progress sensor properly.

    On FSP systems, OPAL was incorrectly setting firmware status
    on a sensor id "00" which doesn't exist.

  • pflash: remove stray d in from info message

  • libflash/pflash: support whole chip erase on mtd access

  • boot_test: fix typo in console message

  • core/pci: Fix criteria in pci_cfg_reg_filter(), i.e. NVLink didn't work.

  • Remove KERNEL_COMMAND_LINE mention from config.h

    We removed the functionality but not the define.

skiboot-5.4.0-rc3

02 Nov 08:37
skiboot-5.4.0-rc3
Compare
Choose a tag to compare
skiboot-5.4.0-rc3 Pre-release
Pre-release

skiboot-5.4.0-rc3 was released on Wednesday November 2nd 2016. It is the
third release candidate of skiboot 5.4, which will become the new stable
release of skiboot following the 5.3 release, first released August 2nd 2016.

skiboot-5.4.0-rc3 contains all bug fixes as of :ref:skiboot-5.3.7
and :ref:skiboot-5.1.18 (the currently maintained stable releases).

For how the skiboot stable releases work, see :ref:stable-rules for details.

Since this is a release candidate, it should NOT be put into production.

The current plan is to release a new release candidate every week until we
feel good about it. The aim is for skiboot-5.4.x to be in op-build v1.13, which
is due by November 23rd 2016.

Over :ref:skiboot-5.4.0-rc2, we have a few changes:

  • pflash: Fail when file is larger than partition
    You can still shoot yourself in the foot by passing --force.
  • core/flash: Don't do anything clever for OPAL_FLASH_{READ, WRITE, ERASE}
    This fixes a bug where opal-prd and opal-gard could fail.
    Fixes: <https://github.com/open-power/skiboot/issues/44>_
  • boot-tests: force BMC to boot from non-golden side
  • fast-reset: Send special reset sequence to operational CPUs only.
    Fixes fast-reset for cases where there are garded CPUs
  • Secure/Trusted boot: be much clearer about what is being measured where.
  • Secure/Trusted boot: be more resilient to disabled TPM(s).
  • Secure/Trusted boot: The force-secure-mode NVRAM setting introduced
    temporarily in :ref:skiboot-5.4.0-rc2 has changed behaviour. Now, by
    default, the secure-mode flag in the device tree is obeyed. As always,
    any skiboot NVRAM options are in no way ABI, API or supported and may cause
    unfinished verbose analogies to appear in release notes relating to the
    dangers of using developer only options.
  • gard: Fix compiler warning on modern GCC targetting ARM 32-bit
  • opal-prd: systemd scripts improvements, only run on supported systems

skiboot 5.4.0 Release Candidate 2

26 Oct 06:24
skiboot-5.4.0-rc2
Compare
Choose a tag to compare
Pre-release

skiboot-5.4.0-rc2

skiboot-5.4.0-rc2 was released on Wednesday October 26th 2016. It is the
second release candidate of skiboot 5.4, which will become the new stable
release of skiboot following the 5.3 release, first released August 2nd 2016.

skiboot-5.4.0-rc2 contains all bug fixes as of :ref:skiboot-5.3.7
and :ref:skiboot-5.1.18 (the currently maintained stable releases).

For how the skiboot stable releases work, see :ref:stable-rules for details.

Since this is a release candidate, it should NOT be put into production.

The current plan is to release a new release candidate every week until we
feel good about it. The aim is for skiboot-5.4.x to be in op-build v1.13, which
is due by November 23rd 2016.

Over :ref:skiboot-5.4.0-rc1, we have a few changes:

Secure and Trusted Boot

skiboot 5.4.0-rc2 improves upon the progress towards Secure and Trusted Boot
in rc1. It is important to note that this is not a complete, end-to-end
secure/trusted boot implementation.

With the current code, it is now possible to verify and measure resources
loaded from PNOR by skiboot (namely the CAPP and BOOTKERNEL partitions).

Note that this functionality is currently only available on systems that
use the libflash backend. It is NOT enabled on IBM FSP based systems.
There is some support for some simulators though.

  • libstb/stb.c: ignore the secure mode flag unless forced in NVRAM

    For this stage in Trusted Boot development, we are wishing to not
    force Secure Mode through the whole firmware boot process, but we
    are wanting to be able to test it (classic chicken and egg problem with
    build infrastructure).

    We disabled secure mode if the secure-enabled devtree property is
    read from the device tree IF we aren't overriding it through NVRAM.
    Seeing as we can only increase (not decrease) what we're checking through
    the NVRAM variable, it is safe.

    The NVRAM setting is force-secure-mode=true in the ibm,skiboot partition.

    However, if you want to force secure mode even if Hostboot has not set
    the secure-enabled proprety in the device tree, set force-secure-mode
    to "always".

    There is also a force-trusted-mode NVRAM setting to force trusted mode
    even if Hostboot has not enabled it int the device tree.

    To indicate to Linux that we haven't gone through the whole firmware
    process in secure mode, we replace the 'secure-enabled' property with
    'partial-secure-enabled', to indicate that only part of the firmware
    boot process has gone through secure mode.

Command line arguments to BOOTKERNEL

  • core/init.c: Fix bootargs parsing

    Currently the bootargs are unconditionally deleted, which causes
    a bug where the bootargs passed in by the device tree are lost.

    This patch deletes bootargs only if it needs to be replaced by the NVRAM
    entry.

    This patch also removes KERNEL_COMMAND_LINE config option in favour of
    using the NVRAM or a device tree.

pflash utility

  • external/pflash: Make MTD accesses the default

    Now that BMC and host kernel mtd drivers exist and have matured we
    should use them by default.

    This is especially important since we seem to be telling everyone to use
    pflash (pflash world domination plans are continuing on schedule).

  • external/pflash: Catch incompatible combination of flags

  • external/common: arm: Don't error trying to wrprotect with MTD access

  • libflash/libffs: Use blocklevel_smart_write() when updating partitions

Other changes

  • extract-gcov: build with -m64 if compiler supports it.

    Fixes build break on 32bit ppc64 (e.g. PowerMac G5, where user space
    is mostly 32bit).

Fast Reset

  • fast-reset: disable fast reboot in event of platform error

    Most of the time, if we're rebooting due to a platform error, we should
    trigger a checkstop. However, if we haven't been told what we should do
    to trigger a checkstop (e.g. on an FSP machine), then we should still
    fail to fast-reboot.

    So, disable fast-reboot in the OPAL_CEC_REBOOT2 code path
    for OPAL_REBOOT_PLATFORM_ERROR reboot type.

  • fast-reboot: disable on FSP code update or unrecoverable HMI

  • fast-reboot: abort fast reboot if CAPP attached

    If a PHB is in CAPI mode, we cannot safely fast reboot - the PHB will be
    fenced during the reboot resulting in major problems when we load the new
    kernel.

    In order to handle this safely, we need to disable CAPI mode before
    resetting PHBs during the fast reboot. However, we don't currently support
    this.

    In the meantime, when fast rebooting, check if there are any PHBs with a
    CAPP attached, and if so, abort the fast reboot and revert to a normal
    reboot instead.

OpenPOWER Platforms

For all hardware platforms that aren't IBM FSP machines:

  • Revert "flash: Move flash node under ibm,opal/flash/"

    This reverts commit e1e6d00.

    Breaks DT enough that it makes people cranky, reverting for now.
    This could break access to flash with existing kernels in POWER9 simulators

  • flash: rework flash_load_resource to correctly read FFS/STB

    This fixes the previous reverts of loading the CAPP partition with
    STB headers (which broke CAPP partitions without STB headers).

    The new logic fixes both CAPP partition loading with STB headers and
    addresses a long standing bug due to differing interpretations of FFS.

    The f_part utility that constructs PNOR files just sets actualSize=totalSize
    no matter on what the size of the partition is. Prior to this patch,
    skiboot would always load actualSize, leading to longer than needed IPL.

    The pflash utility updates actualSize, so no developer has really ever
    noticed this, apart from maybe an inkling that it's odd that a freshly
    baked PNOR from op-build takes ever so slightly longer to boot than one
    that has had individual partitions pflashed in.

    With this patch, we now compute actualSize. For partitions with a STB
    header, we take the payload size from the STB header. For partitions
    that don't have a STB header, we compute the size either by parsing
    the ELF header or by looking at the subpartition header and computing it.

    We now need to read the entire partition for partitions with subpartitions
    so that we pass consistent values to be measured as part of Trusted Boot.

    As of this patch, the actualSize field in FFS is not relied on for
    partition size, we determine it from the content of the partition.

    However, this patch will break loading of partitions that are not ELF
    and do not contain subpartitions. Luckily, nothing in-tree makes use of
    that.

PCI

  • pci: Check power state before powering off slot

    Prevents the erroneous "Error -1 powering off slot" error message.

Contributors

Since :ref:skiboot-5.4.0-rc1, we have 23 csets from 8 developers.

A total of 876 lines added, 621 removed (delta 255)

Developers with the most changesets

============================ = =======
Developer # %
============================ = =======
Stewart Smith 7 (30.4%)
Cyril Bur 5 (21.7%)
Mukesh Ojha 3 (13.0%)
Gavin Shan 3 (13.0%)
Claudio Carvalho 2 (8.7%)
Chris Smart 1 (4.3%)
Andrew Donnellan 1 (4.3%)
Nageswara R Sastry 1 (4.3%)
============================ = =======

Developers with the most changed lines

========================== === =======
Developer # %
========================== === =======
Stewart Smith 424 (45.7%)
Mukesh Ojha 204 (22.0%)
Gavin Shan 173 (18.6%)
Cyril Bur 69 (7.4%)
Claudio Carvalho 35 (3.8%)
Andrew Donnellan 13 (1.4%)
Chris Smart 8 (0.9%)
Nageswara R Sastry 2 (0.2%)
========================== === =======

Developers with the most lines removed

============================ = =======
Developer # %
============================ = =======
Gavin Shan 9 (1.4%)
Chris Smart 4 (0.6%)
============================ = =======

Developers with the most signoffs (total 16)

============================ = =======
Developer # %
============================ = =======
Stewart Smith 16 (100.0%)
============================ = =======

Developers with the most reviews (total 4)

============================ = =======
Developer # %
============================ = =======
Vasant Hegde 2 (50.0%)
Andrew Donnellan 2 (50.0%)
============================ = =======

Developers with the most test credits (total 1)

============================ = =======
Developer # %
============================ = =======
Pridhiviraj Paidipeddi 1 (100.0%)
============================ = =======

Developers who gave the most tested-by credits (total 1)

============================ = =======
Developer # %
============================ = =======
Gavin Shan 1 (100.0%)
============================ = =======

Developers with the most report credits (total 3)

============================ = =======
Developer # %
============================ = =======
Pridhiviraj Paidipeddi 1 (33.3%)
Andrei Warkenti 1 (33.3%)
Michael Neuling 1 (33.3%)
============================ = =======

Developers who gave the most report credits (total 3)

============================ = =======
Developer # %
============================ = =======
Stewart Smith 2 (66.7%)
Gavin Shan 1 (33.3%)
============================ = =======