Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

m #1

Merged
merged 10,000 commits into from
Jun 19, 2014
Merged

m #1

merged 10,000 commits into from
Jun 19, 2014

Conversation

AnTAVR
Copy link
Owner

@AnTAVR AnTAVR commented Jun 19, 2014

No description provided.

mripard and others added 30 commits June 11, 2014 10:25
Right now, AHB is an indirect child clock of the CPU clock. If that
happens to change, since the CPU clock has no other consumers declared
in Linux, it would be shut down, which is not really a good idea.

Prevent this by forcing it enabled.

Signed-off-by: Maxime Ripard <[email protected]>
Acked-by: Mike Turquette <[email protected]>
Signed-off-by: Emilio López <[email protected]>
Prevent the SDRAM controller from being gated by force-enabling it in the
machine code.

Signed-off-by: Maxime Ripard <[email protected]>
Acked-by: Mike Turquette <[email protected]>
Signed-off-by: Emilio López <[email protected]>
The PRCM (Power/Reset/Clock Management) unit provides several clock
devices:
- AR100 clk: used to clock the Power Management co-processor
- AHB0 clk: used to clock the AHB0 bus
- APB0 clk and gates: used to clk peripherals connected to the APB0 bus

Add support for these clks in a separate driver so that they can be probed
as platform devices instead of registered during early init.
This is needed to be able to probe PRCM MFD subdevices.

Signed-off-by: Boris BREZILLON <[email protected]>
Acked-by: Maxime Ripard <[email protected]>
Signed-off-by: Emilio López <[email protected]>
Document new compatible strings for clock provided by the PRCM
(Power/Reset/Clock Management) unit.

Signed-off-by: Boris BREZILLON <[email protected]>
Signed-off-by: Emilio López <[email protected]>
As Ben suggested, it's meaningful to dump PE's location code
for site engineers when hitting EEH errors. The patch introduces
function eeh_pe_loc_get() to retireve the location code from
dev-tree so that we can output it when hitting EEH errors.

If primary PE bus is root bus, the PHB's dev-node would be tried
prior to root port's dev-node. Otherwise, the upstream bridge's
dev-node of the primary PE bus will be check for the location code
directly.

Signed-off-by: Gavin Shan <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Current code does not check for unhandled/unrecovered errors and return from
interrupt if it is recoverable exception which in-turn triggers same machine
check exception in a loop causing hypervisor to be unresponsive.

This patch fixes this situation and forces hypervisor to panic for
unhandled/unrecovered errors.

This patch also fixes another issue where unrecoverable_exception routine
was called in real mode in case of unrecoverable exception (MSR_RI = 0).
This causes another exception vector 0x300 (data access) during system crash
leading to confusion while debugging cause of the system crash.

Also turn ME bit off while going down, so that when another MCE is hit during
panic path, system will checkstop and hypervisor will get restarted cleanly
by SP.

With the above fixes we now throw correct console messages (see below) while
crashing the system in case of unhandled/unrecoverable machine checks.

--------------
Severe Machine check interrupt [[Not recovered]
  Initiator: CPU
  Error type: UE [Instruction fetch]
    Effective address: 0000000030002864
Oops: Machine check, sig: 7 [#1]
SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: bork(O) bridge stp llc kvm [last unloaded: bork]
CPU: 36 PID: 55162 Comm: bash Tainted: G           O 3.14.0mce #1
task: c000002d72d022d0 ti: c000000007ec0000 task.ti: c000002d72de4000
NIP: 0000000030002864 LR: 00000000300151a4 CTR: 000000003001518c
REGS: c000000007ec3d80 TRAP: 0200   Tainted: G           O  (3.14.0mce)
MSR: 9000000000041002 <SF,HV,ME,RI>  CR: 28222848  XER: 20000000
CFAR: 0000000030002838 DAR: d0000000004d0000 DSISR: 00000000 SOFTE: 1
GPR00: 000000003001512c 0000000031f92cb0 0000000030078af0 0000000030002864
GPR04: d0000000004d0000 0000000000000000 0000000030002864 ffffffffffffffc9
GPR08: 0000000000000024 0000000030008af0 000000000000002c c00000000150e728
GPR12: 9000000000041002 0000000031f90000 0000000010142550 0000000040000000
GPR16: 0000000010143cdc 0000000000000000 00000000101306fc 00000000101424dc
GPR20: 00000000101424e0 000000001013c6f0 0000000000000000 0000000000000000
GPR24: 0000000010143ce0 00000000100f6440 c000002d72de7e00 c000002d72860250
GPR28: c000002d72860240 c000002d72ac0038 0000000000000008 0000000000040000
NIP [0000000030002864] 0x30002864
LR [00000000300151a4] 0x300151a4
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 7285f0beac1e29d3 ]---

Sending IPI to other CPUs
IPI complete
OPAL V3 detected !
--------------

Signed-off-by: Mahesh Salgaonkar <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Currently machine check handler does not check for stack overflow for
nested machine check. If we hit another MCE while inside the machine check
handler repeatedly from same address then we get into risk of stack
overflow which can cause huge memory corruption. This patch limits the
nested MCE level to 4 and panic when we cross level 4.

Signed-off-by: Mahesh Salgaonkar <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
…call.

We don't see MCE counter getting increased in /proc/interrupts which gives
false impression of no MCE occurred even when there were MCE events.
The machine check early handling was added for PowerKVM and we missed to
increment the MCE count in the early handler.

We also increment mce counters in the machine_check_exception call, but
in most cases where we handle the error hypervisor never reaches there
unless its fatal and we want to crash. Only during fatal situation we may
see double increment of mce count. We need to fix that. But for
now it always good to have some count increased instead of zero.

Signed-off-by: Mahesh Salgaonkar <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
… in guest.

Currently we forward MCEs to guest which have been recovered by guest.
And for unhandled errors we do not deliver the MCE to guest. It looks like
with no support of FWNMI in qemu, guest just panics whenever we deliver the
recovered MCEs to guest. Also, the existig code used to return to host for
unhandled errors which was casuing guest to hang with soft lockups inside
guest and makes it difficult to recover guest instance.

This patch now forwards all fatal MCEs to guest causing guest to crash/panic.
And, for recovered errors we just go back to normal functioning of guest
instead of returning to host. This fixes soft lockup issues in guest.
This patch also fixes an issue where guest MCE events were not logged to
host console.

Signed-off-by: Mahesh Salgaonkar <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
…into clk-next

Rebase of Emilio's clk-sunxi-for-3.16 on top of clk-next

Fixed a few compilation warnings exposed by a patch introduced during the 3.16
merge window.

Original tag message:

Allwinner sunXi SoCs clock changes

This pull contains some new code to add support for A31 clocks by Maxime
and Boris. It also reworks the driver a bit to avoid having a huge
single file when we have a full folder for ourselves, and separating
different functional units makes sense.
There was a bug in debug printout when CONFIG_REISERFS_CHECK was
enabled so one of the assertions in do_balan.c didn't compile. Fix it.

Fixes: 0080e9f
Signed-off-by: Jan Kara <[email protected]>
Pull md updates from Neil Brown:
 "Assorted md fixes for 3.16

  Mostly performance improvements with a few corner-case bug fixes"

* tag 'md/3.16' of git://neil.brown.name/md:
  raid5: speedup sync_request processing
  md/raid5: deadlock between retry_aligned_read with barrier io
  raid5: add an option to avoid copy data from bio to stripe cache
  md/bitmap: remove confusing code from filemap_get_page.
  raid5: avoid release list until last reference of the stripe
  md: md_clear_badblocks should return an error code on failure.
  md/raid56: Don't perform reads to support writes until stripe is ready.
  md: refuse to change shape of array if it is active but read-only
Pull MTD updates from Brian Norris:
 - refactor m25p80.c driver for use as a general SPI NOR framework for
   other drivers which may speak to SPI NOR flash without providing full
   SPI support (i.e., not part of drivers/spi/)
 - new Freescale QuadSPI driver (utilizing new SPI NOR framework)
 - updates for the STMicro "FSM" SPI NOR driver
 - fix sync/flush behavior on mtd_blkdevs
 - fixup subpage write support on a few NAND drivers
 - correct the MTD OOB test for odd-sized OOB areas
 - add BCH-16 support for OMAP NAND
 - fix warnings and trivial refactoring
 - utilize new ECC DT bindings in pxa3xx NAND driver
 - new LPDDR NVM driver
 - address a few assorted bugs caught by Coverity
 - add new imx6sx support for GPMI NAND
 - use a bounce buffer for NAND when non-DMA-able buffers are used

* tag 'for-linus-20140610' of git://git.infradead.org/linux-mtd: (77 commits)
  mtd: gpmi: add gpmi support for imx6sx
  mtd: maps: remove check for CONFIG_MTD_SUPERH_RESERVE
  mtd: bf5xx_nand: use the managed version of kzalloc
  mtd: pxa3xx_nand: make the driver work on big-endian systems
  mtd: nand: omap: fix omap_calculate_ecc_bch() for-loop error
  mtd: nand: r852: correct write_buf loop bounds
  mtd: nand_bbt: handle error case for nand_create_badblock_pattern()
  mtd: nand_bbt: remove unused variable
  mtd: maps: sc520cdp: fix warnings
  mtd: slram: fix unused variable warning
  mtd: pfow: remove unused variable
  mtd: lpddr: fix Kconfig dependency, for I/O accessors
  mtd: nand: pxa3xx: Add supported ECC strength and step size to the DT binding
  mtd: nand: pxa3xx: Use ECC strength and step size devicetree binding
  mtd: nand: pxa3xx: Clean pxa_ecc_init() error handling
  mtd: nand: Warn the user if the selected ECC strength is too weak
  mtd: nand: omap: Documentation: How to select correct ECC scheme for your device ?
  mtd: nand: omap: add support for BCH16_ECC - NAND driver updates
  mtd: nand: omap: add support for BCH16_ECC - ELM driver updates
  mtd: nand: omap: add support for BCH16_ECC - GPMC driver updates
  ...
Pull block layer fixes from Jens Axboe:
 "Final small batch of fixes to be included before -rc1.  Some general
  cleanups in here as well, but some of the blk-mq fixes we need for the
  NVMe conversion and/or scsi-mq.  The pull request contains:

   - Support for not merging across a specified "chunk size", if set by
     the driver.  Some NVMe devices perform poorly for IO that crosses
     such a chunk, so we need to support it generically as part of
     request merging avoid having to do complicated split logic.  From
     me.

   - Bump max tag depth to 10Ki tags.  Some scsi devices have a huge
     shared tag space.  Before we failed with EINVAL if a too large tag
     depth was specified, now we truncate it and pass back the actual
     value.  From me.

   - Various blk-mq rq init fixes from me and others.

   - A fix for enter on a dying queue for blk-mq from Keith.  This is
     needed to prevent oopsing on hot device removal.

   - Fixup for blk-mq timer addition from Ming Lei.

   - Small round of performance fixes for mtip32xx from Sam Bradshaw.

   - Minor stack leak fix from Rickard Strandqvist.

   - Two __init annotations from Fabian Frederick"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: add __init to blkcg_policy_register
  block: add __init to elv_register
  block: ensure that bio_add_page() always accepts a page for an empty bio
  blk-mq: add timer in blk_mq_start_request
  blk-mq: always initialize request->start_time
  block: blk-exec.c: Cleaning up local variable address returnd
  mtip32xx: minor performance enhancements
  blk-mq: ->timeout should be cleared in blk_mq_rq_ctx_init()
  blk-mq: don't allow queue entering for a dying queue
  blk-mq: bump max tag depth to 10K tags
  block: add blk_rq_set_block_pc()
  block: add notion of a chunk size for request merging
Return a 0 directly rather than a constant.

Reported-by: Peter Senna Tschudin <[email protected]>
Signed-off-by: Todd Fujinaka <[email protected]>
Tested-by: Aaron Brown <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
This patch separates the hardware logic from the set function, so that
we can re-use it during a ptp_reset. This enables the reset to return
functionality to the last known timestamp mode, rather than resetting
the value. We initialize the mode to off during the ptp_init cycle.

Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Aaron Brown <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Set appropriate fields in Tx queue configuration virtchnl message
to pf to enable headwb and setup headwb addr.
Then use that info from the VF to set headwb and headwb_addr instead of
always enabling them.

Change-ID: I7d393d1b2b07f0f3355b3a4f7c2d3c6ee3b0d622
Signed-off-by: Ashish Shah <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Add set_pf_context, replace set_phy_reset with set_phy_debug, add
nvm_config_read/write, remove nvm_read/write_reg_se and add some
PHY types.

With these changes we bump the API version to 1.2.

Change-ID: I4dc3aec175c2316f66fc9b726b3f7d594699d84e
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
In XL710 devices we program FD filter's fields from Tx perspective of the flow.
However the user interface exposed in ethtool should be compliant with the
previous generation of drivers where a filter src and dst field are from
the RX perspective. This patch changes the ethtool interface in this regard
to match the other drivers.

Change-ID: Iec6ccddd87357c4fb53ccf33aa0fae699faf70cf
Signed-off-by: Anjali Singhai Jain <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
If the device is down, there's no place to go but up, so don't try to go
down even more. This prevents a CPU soft lock in napi_disable().

Change-ID: I8b058b9ee974dfa01c212fae2597f4f54b333314
Signed-off-by: Mitch Williams <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Currently if the firmware reports DCB capability the driver enables
I40E_FLAG_DCB_ENABLED flag. When this flag is enabled the driver
inserts a tag when transmitting a packet from the port even if there
are no DCB traffic classes configured at the port.

This patch adds a new flag I40E_FLAG_DCB_CAPABLE that will be set
when the DCB capability is present and the existing flag
I40E_FLAG_DCB_ENABLED will be set only if there are more than one
traffic classes configured at the port.

Change-ID: I24ccbf53ef293db2eba80c8a9772acf729795bd5
Signed-off-by: Neerav Parikh <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Remove the filter created by the firmware with the default MAC address it
reads out of the NVM storage and a promiscuous VLAN tag and replace it
with a filter that will not accept tagged packets by default.  The system
must request a VLAN tag packet filter to get packets with that tag.

Change-ID: I119e6c3603a039bd68282ba31bf26f33a575490a
Signed-off-by: Greg Rose <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Linux gives us a function to copy Ethernet MAC addresses, let's use it.

Change-ID: I0c861900029ca5ea65a53ca39565852fb633f6fd
Signed-off-by: Greg Rose <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
FW can indicate any admin queue error states to the driver via some bits
in the length registers. Each time we process an admin queue message,
check these bits and log any errors we find. Since the VF really can't
do much, we just print the message and depend on the PF driver to clear
things up on our behalf.

Change-ID: I92bc6c53ce3b4400544e0ca19c5de2d27490bd0d
Signed-off-by: Mitch Williams <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
This change drops the FTYPE field from the Rx descriptor, to
match the hardware implementation.

Change-ID: I66d31d2b43861da45e8ace4fb03df033abe88bab
Signed-off-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Add members to stat struct to keep track of Flow director ATR and
SideBand filter packet matches.

Change-ID: Ibbb31a53c7adcc2bb96991dd80565442a2f2513c
Signed-off-by: Anjali Singhai Jain <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
A drop action comes down as a ring_cookie value, so allow it as
a special value that can be used to configure destination control.

Also fix the output to filter read command accordingly.

Change-ID: I9956723cee42f3194885403317dd21ed4a151144
Signed-off-by: Anjali Singhai Jain <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
No need to read the PCI register for the PF's base queue on every single Tx
queue enable and disable as we already have the value stored from reading
the capability features at startup.

Change-ID: Ic02fb622757742f43cb8269369c3d972d4f66555
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Bump versions.

Change-ID: Ic4a84354955061ca18321b1e97c9c30fe1563b5c
Signed-off-by: Catherine Sullivan <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Pull xfs updates from Dave Chinner:
 "This update contains:
   - cleanup removing unused function args
   - rework of the filestreams allocator to use dentry cache parent
     lookups
   - new on-disk free inode btree and optimised inode allocator
   - various bug fixes
   - rework of internal attribute API
   - cleanup of superblock feature bit support to remove historic cruft
   - more fixes and minor cleanups
   - added a new directory/attribute geometry abstraction
   - yet more fixes and minor cleanups"

* tag 'xfs-for-linus-3.16-rc1' of git://oss.sgi.com/xfs/xfs: (86 commits)
  xfs: fix xfs_da_args sparse warning in xfs_readdir
  xfs: Fix rounding in xfs_alloc_fix_len()
  xfs: tone down writepage/releasepage WARN_ONs
  xfs: small cleanup in xfs_lowbit64()
  xfs: kill xfs_buf_geterror()
  xfs: xfs_readsb needs to check for magic numbers
  xfs: block allocation work needs to be kswapd aware
  xfs: remove redundant geometry information from xfs_da_state
  xfs: replace attr LBSIZE with xfs_da_geometry
  xfs: pass xfs_da_args to xfs_attr_leaf_newentsize
  xfs: use xfs_da_geometry for block size in attr code
  xfs: remove mp->m_dir_geo from directory logging
  xfs: reduce direct usage of mp->m_dir_geo
  xfs: move node entry counts to xfs_da_geometry
  xfs: convert dir/attr btree threshold to xfs_da_geometry
  xfs: convert m_dirblksize to xfs_da_geometry
  xfs: convert m_dirblkfsbs to xfs_da_geometry
  xfs: convert directory segment limits to xfs_da_geometry
  xfs: convert directory db conversion to xfs_da_geometry
  xfs: convert directory dablk conversion to xfs_da_geometry
  ...
torvalds and others added 26 commits June 14, 2014 19:48
…/git/mason/linux-btrfs

Pull more btrfs updates from Chris Mason:
 "This has a few fixes since our last pull and a new ioctl for doing
  btree searches from userland.  It's very similar to the existing
  ioctl, but lets us return larger items back down to the app"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix error handling in create_pending_snapshot
  btrfs: fix use of uninit "ret" in end_extent_writepage()
  btrfs: free ulist in qgroup_shared_accounting() error path
  Btrfs: fix qgroups sanity test crash or hang
  btrfs: prevent RCU warning when dereferencing radix tree slot
  Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting
  btrfs: new ioctl TREE_SEARCH_V2
  btrfs: tree_search, search_ioctl: direct copy to userspace
  btrfs: new function read_extent_buffer_to_user
  btrfs: tree_search, copy_to_sk: return needed size on EOVERFLOW
  btrfs: tree_search, copy_to_sk: return EOVERFLOW for too small buffer
  btrfs: tree_search, search_ioctl: accept varying buffer
  btrfs: tree_search: eliminate redundant nr_items check
…el/git/jejb/scsi

Pull more SCSI updates from James Bottomley:
 "This is just a couple of drivers (hpsa and lpfc) that got left out for
  further testing in linux-next.  We also have one fix to a prior
  submission (qla2xxx sparse)"

* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (36 commits)
  qla2xxx: fix sparse warnings introduced by previous target mode t10-dif patch
  lpfc: Update lpfc version to driver version 10.2.8001.0
  lpfc: Fix ExpressLane priority setup
  lpfc: mark old devices as obsolete
  lpfc: Fix for initializing RRQ bitmap
  lpfc: Fix for cleaning up stale ring flag and sp_queue_event entries
  lpfc: Update lpfc version to driver version 10.2.8000.0
  lpfc: Update Copyright on changed files from 8.3.45 patches
  lpfc: Update Copyright on changed files
  lpfc: Fixed locking for scsi task management commands
  lpfc: Convert runtime references to old xlane cfg param to fof cfg param
  lpfc: Fix FW dump using sysfs
  lpfc: Fix SLI4 s abort loop to process all FCP rings and under ring_lock
  lpfc: Fixed kernel panic in lpfc_abort_handler
  lpfc: Fix locking for postbufq when freeing
  lpfc: Fix locking for lpfc_hba_down_post
  lpfc: Fix dynamic transitions of FirstBurst from on to off
  hpsa: fix handling of hpsa_volume_offline return value
  hpsa: return -ENOMEM not -1 on kzalloc failure in hpsa_get_device_id
  hpsa: remove messages about volume status VPD inquiry page not supported
  ...
Joseph Gasparakis reported that VXLAN GSO offload stopped working with
i40e device after recent UDP changes. The problem is that the
SKB_GSO_* bits are out of sync with the corresponding NETIF flags. This
patch fixes that. Also, we add BUILD_BUG_ONs in net_gso_ok for several
GSO constants that were missing to avoid the problem in the future.

Reported-by: Joseph Gasparakis <[email protected]>
Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Geert reported issues regarding checksum complete and UDP.
The logic introduced in commit 7e3cead
("net: Save software checksum complete") is not correct.

This patch:
1) Restores code in __skb_checksum_complete_header except for setting
   CHECKSUM_UNNECESSARY. This function may be calculating checksum on
   something less than skb->len.
2) Adds saving checksum to __skb_checksum_complete. The full packet
   checksum 0..skb->len is calculated without adding in pseudo header.
   This value is saved in skb->csum and then the pseudo header is added
   to that to derive the checksum for validation.
3) In both __skb_checksum_complete_header and __skb_checksum_complete,
   set skb->csum_valid to whether checksum of zero was computed. This
   allows skb_csum_unnecessary to return true without changing to
   CHECKSUM_UNNECESSARY which was done previously.
4) Copy new csum related bits in __copy_skb_header.

Reported-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
In __udp_lib_checksum_complete check if checksum is being done over all
the data (len is equal to skb->len) and if it is call
__skb_checksum_complete instead of __skb_checksum_complete_head. This
allows checksum to be saved in checksum complete.

Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
This function is used by UDP encapsulation protocols in RX when
crossing encapsulation boundary. If ip_summed is set to
CHECKSUM_UNNECESSARY and encapsulation is not set, change to
CHECKSUM_NONE since the checksum has not been validated within the
encapsulation. Clears csum_valid by the same rationale.

Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Call skb_pop_rcv_encapsulation and postpull_rcsum for the Ethernet
header to work properly with checksum complete.

Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Tom Herbert says:

====================
Fixes related to some recent checksum modifications.

- Fix GSO constants to match NETIF flags
- Fix logic in saving checksum complete in __skb_checksum_complete
- Call __skb_checksum_complete from UDP if we are checksumming over
  whole packet in order to save checksum.
- Fixes to VXLAN to work correctly with checksum complete
====================

Signed-off-by: David S. Miller <[email protected]>
Commit 3fd091e ("[SCTP]: Remove multiple levels of msecs
to jiffies conversions.") has silently changed permissions for
rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of
this was to discourage users from tweaking rto_alpha and
rto_beta knobs in production environments since they are key
to correctly compute rtt/srtt.

RFC4960 under section 6.3.1. RTO Calculation says regarding
rto_alpha and rto_beta under rule C3 and C4:

  [...]
  C3)  When a new RTT measurement R' is made, set

       RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'|

       and

       SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R'

       Note: The value of SRTT used in the update to RTTVAR
       is its value before updating SRTT itself using the
       second assignment. After the computation, update
       RTO <- SRTT + 4 * RTTVAR.

  C4)  When data is in flight and when allowed by rule C5
       below, a new RTT measurement MUST be made each round
       trip. Furthermore, new RTT measurements SHOULD be
       made no more than once per round trip for a given
       destination transport address. There are two reasons
       for this recommendation: First, it appears that
       measuring more frequently often does not in practice
       yield any significant benefit [ALLMAN99]; second,
       if measurements are made more often, then the values
       of RTO.Alpha and RTO.Beta in rule C3 above should be
       adjusted so that SRTT and RTTVAR still adjust to
       changes at roughly the same rate (in terms of how many
       round trips it takes them to reflect new values) as
       they would if making only one measurement per
       round-trip and using RTO.Alpha and RTO.Beta as given
       in rule C3. However, the exact nature of these
       adjustments remains a research issue.
  [...]

While it is discouraged to adjust rto_alpha and rto_beta
and not further specified how to adjust them, the RFC also
doesn't explicitly forbid it, but rather gives a RECOMMENDED
default value (rto_alpha=3, rto_beta=2). We have a couple
of users relying on the old permissions before they got
changed. That said, if someone really has the urge to adjust
them, we could allow it with a warning in the log.

Fixes: 3fd091e ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.")
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Commit 0fb7a01 "random: simplify accounting code", introduced in
v3.15, has a very nasty accounting problem when the entropy pool has
has fewer bytes of entropy than the number of requested reserved
bytes.  In that case, "have_bytes - reserved" goes negative, and since
size_t is unsigned, the expression:

       ibytes = min_t(size_t, ibytes, have_bytes - reserved);

... does not do the right thing.  This is rather bad, because it
defeats the catastrophic reseeding feature in the
xfer_secondary_pool() path.

It also can cause the "BUG: spinlock trylock failure on UP" for some
kernel configurations when prandom_reseed() calls get_random_bytes()
in the early init, since when the entropy count gets corrupted,
credit_entropy_bits() erroneously believes that the nonblocking pool
has been fully initialized (when in fact it is not), and so it calls
prandom_reseed(true) recursively leading to the spinlock BUG.

The logic is *not* the same it was originally, but in the cases where
it matters, the behavior is the same, and the resulting code is
hopefully easier to read and understand.

Fixes: 0fb7a01 "random: simplify accounting code"
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: Greg Price <[email protected]>
Cc: [email protected]  #v3.15
Pull NVMe update from Matthew Wilcox:
 "Mostly bugfixes again for the NVMe driver.  I'd like to call out the
  exported tracepoint in the block layer; I believe Keith has cleared
  this with Jens.

  We've had a few reports from people who're really pounding on NVMe
  devices at scale, hence the timeout changes (and new module
  parameters), hotplug cpu deadlock, tracepoints, and minor performance
  tweaks"

[ Jens hadn't seen that tracepoint thing, but is ok with it - it will
  end up going away when mq conversion happens ]

* git://git.infradead.org/users/willy/linux-nvme: (22 commits)
  NVMe: Fix START_STOP_UNIT Scsi->NVMe translation.
  NVMe: Use Log Page constants in SCSI emulation
  NVMe: Define Log Page constants
  NVMe: Fix hot cpu notification dead lock
  NVMe: Rename io_timeout to nvme_io_timeout
  NVMe: Use last bytes of f/w rev SCSI Inquiry
  NVMe: Adhere to request queue block accounting enable/disable
  NVMe: Fix nvme get/put queue semantics
  NVMe: Delete NVME_GET_FEAT_TEMP_THRESH
  NVMe: Make admin timeout a module parameter
  NVMe: Make iod bio timeout a parameter
  NVMe: Prevent possible NULL pointer dereference
  NVMe: Fix the buffer size passed in GetLogPage(CDW10.NUMD)
  NVMe: Update data structures for NVMe 1.2
  NVMe: Enable BUILD_BUG_ON checks
  NVMe: Update namespace and controller identify structures to the 1.1a spec
  NVMe: Flush with data support
  NVMe: Configure support for block flush
  NVMe: Add tracepoints
  NVMe: Protect against badly formatted CQEs
  ...
…ike.turquette/linux

Pull more clock framework updates from Mike Turquette:
 "This contains the second half the of the clk changes for 3.16.

  They are simply fixes and code refactoring for the OMAP clock drivers.
  The sunxi clock driver changes include splitting out the one
  mega-driver into several smaller pieces and adding support for the A31
  SoC clocks"

* tag 'clk-for-linus-3.16-part2' of git://git.linaro.org/people/mike.turquette/linux: (25 commits)
  clk: sunxi: document PRCM clock compatible strings
  clk: sunxi: add PRCM (Power/Reset/Clock Management) clks support
  clk: sun6i: Protect SDRAM gating bit
  clk: sun6i: Protect CPU clock
  clk: sunxi: Rework clock protection code
  clk: sunxi: Move the GMAC clock to a file of its own
  clk: sunxi: Move the 24M oscillator to a file of its own
  clk: sunxi: Remove calls to clk_put
  clk: sunxi: document new A31 USB clock compatible
  clk: sunxi: Implement A31 USB clock
  ARM: dts: OMAP5/DRA7: use omap5-mpu-dpll-clock capable of dealing with higher frequencies
  CLK: TI: dpll: support OMAP5 MPU DPLL that need special handling for higher frequencies
  ARM: OMAP5+: dpll: support Duty Cycle Correction(DCC)
  CLK: TI: clk-54xx: Set the rate for dpll_abe_m2x2_ck
  CLK: TI: Driver for DRA7 ATL (Audio Tracking Logic)
  dt:/bindings: DRA7 ATL (Audio Tracking Logic) clock bindings
  ARM: dts: dra7xx-clocks: Correct name for atl clkin3 clock
  CLK: TI: gate: add composite interface clock to OMAP2 only build
  ARM: OMAP2: clock: add DT boot support for cpufreq_ck
  CLK: TI: OMAP2: add clock init support
  ...
Pull networking fixes from David Miller:

 1) Fix checksumming regressions, from Tom Herbert.

 2) Undo unintentional permissions changes for SCTP rto_alpha and
    rto_beta sysfs knobs, from Denial Borkmann.

 3) VXLAN, like other IP tunnels, should advertize it's encapsulation
    size using dev->needed_headroom instead of dev->hard_header_len.
    From Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: sctp: fix permissions for rto_alpha and rto_beta knobs
  vxlan: Checksum fixes
  net: add skb_pop_rcv_encapsulation
  udp: call __skb_checksum_complete when doing full checksum
  net: Fix save software checksum complete
  net: Fix GSO constants to match NETIF flags
  udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup
  vxlan: use dev->needed_headroom instead of dev->hard_header_len
  MAINTAINERS: update cxgb4 maintainer
This reverts commit e1edf18.

This patch was a misguided attempt at fixing offb for LE ppc64
kernels on BE qemu but is just wrong ... it breaks real LE/LE
setups, LE with real HW, and existing mixed endian systems
that did the fight thing with the appropriate device-tree
property. Bad reviewing on my part, sorry.

The right fix is to either make qemu change its endian when
the guest changes endian (working on that) or to use the
existing foreign endian support.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
CC: <[email protected]> [v3.13+]
---
of_find_node_by_name() walks the allnodes list, and can thus walk
outside of the parent node. Use of_get_child_by_name() instead.

Signed-off-by: Axel Lin <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
…/ltc3589' and 'regulator/fix/palmas' into regulator-linus
Commit bf5db2f (microblaze: Use generic device.h) removes the
microblaze specific pdev_archdata and dma_mask.

At the same time, commit 591c1ee (of: configure the platform
device dma parameters) initializes the just removed field.
This causes all microblaze builds to fail.

Drop the unnecessary initialization.

Cc: Michal Simek <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Acked-by: Michal Simek <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
This makes sure a format string cannot leak into the kobject name that
is constructed. (And splits the >80 character line.)

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
of_find_node_by_path() is borked because of_allnodes is not guaranteed to
contain the root of the tree after using any of the dynamic update functions
because some other nodes ends up as of_allnodes.

Fixes: c22e650 of: Make of_find_node_by_path() handle /aliases
Reported-by: [email protected]
Signed-off-by: Frank Rowand <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
This fixes use-after-free of epi->fllink.next inside list loop macro.
This loop actually releases elements in the body.  The list is
rcu-protected but here we cannot hold rcu_read_lock because we need to
lock mutex inside.

The obvious solution is to use list_for_each_entry_safe().  RCU-ness
isn't essential because nobody can change this list under us, it's final
fput for this file.

The bug was introduced by ae10b2b ("epoll: optimize EPOLL_CTL_DEL
using rcu")

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Reported-by: Cyrill Gorcunov <[email protected]>
Cc: Stable <[email protected]> # 3.13+
Cc: Sasha Levin <[email protected]>
Cc: Jason Baron <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
…l/git/broonie/spi

Pull spi fix from Mark Brown:
 "A single bugfix from the merge window, fixing an issue with DMA at
  slow speeds on Intel hardware"

* tag 'spi-v3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi/pxa2xx: change default supported DMA burst size to 1
…/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A few small fixes:

   - Fixes for bugs in Palmas and as3722 exposed by recent changes in
     the core - the drivers weren't allowing the current configuration
     to be read sometimes.
   - Provide a stub for regulator_can_change_voltage() for !REGULATOR.
   - Make sure ltc3589 only looks at its child nodes in the device tree
     rather than searching the whole tree"

* tag 'regulator-v3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: ltc3589: Use of_get_child_by_name
  regulator: palmas: Fix SMPS list for 0V
  regulator: add regulator_can_change_voltage stub
  regulator: as3722: Make 0 a valid selector
…rnel/git/tytso/random

Pull randomness bugfix from Ted Ts'o:
 "random: fix entropy accounting bug introduced in v3.15"

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
  random: fix nasty entropy accounting bug
…ernel/git/robh/linux

Pull device tree fixes from Rob Herring:
 - fix microblaze compiling due to conflicting merge window changes
 - a fix in of_attach_node due to of_find_node_by_path changes
 - prevent format strings in DT kobject names

* tag 'dt-fixes-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  OF: fix of_find_node_by_path() assumption that of_allnodes is root
  of: avoid format string parsing in kobject names
  of/platform: Fix microblaze build failure
AnTAVR added a commit that referenced this pull request Jun 19, 2014
@AnTAVR AnTAVR merged commit 190bcee into AnTAVR:master Jun 19, 2014
AnTAVR pushed a commit that referenced this pull request Jun 19, 2014
Avoid that closing /dev/infiniband/umad<n> or /dev/infiniband/issm<n>
triggers a use-after-free.  __fput() invokes f_op->release() before it
invokes cdev_put().  Make sure that the ib_umad_device structure is
freed by the cdev_put() call instead of f_op->release().  This avoids
that changing the port mode from IB into Ethernet and back to IB
followed by restarting opensmd triggers the following kernel oops:

    general protection fault: 0000 [#1] PREEMPT SMP
    RIP: 0010:[<ffffffff810cc65c>]  [<ffffffff810cc65c>] module_put+0x2c/0x170
    Call Trace:
     [<ffffffff81190f20>] cdev_put+0x20/0x30
     [<ffffffff8118e2ce>] __fput+0x1ae/0x1f0
     [<ffffffff8118e35e>] ____fput+0xe/0x10
     [<ffffffff810723bc>] task_work_run+0xac/0xe0
     [<ffffffff81002a9f>] do_notify_resume+0x9f/0xc0
     [<ffffffff814b8398>] int_signal+0x12/0x17

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75051
Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Yann Droneaud <[email protected]>
Cc: <[email protected]> # 3.x: 8ec0a0e: IB/umad: Fix error handling
Signed-off-by: Roland Dreier <[email protected]>
AnTAVR pushed a commit that referenced this pull request Jun 19, 2014
When running low on available disk space and having several processes
doing buffered file IO, I got the following trace in dmesg:

[ 4202.720152] INFO: task kworker/u8:1:5450 blocked for more than 120 seconds.
[ 4202.720401]       Not tainted 3.13.0-fdm-btrfs-next-26+ #1
[ 4202.720596] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4202.720874] kworker/u8:1    D 0000000000000001     0  5450      2 0x00000000
[ 4202.720904] Workqueue: btrfs-flush_delalloc normal_work_helper [btrfs]
[ 4202.720908]  ffff8801f62ddc38 0000000000000082 ffff880203ac2490 00000000001d3f40
[ 4202.720913]  ffff8801f62ddfd8 00000000001d3f40 ffff8800c4f0c920 ffff880203ac2490
[ 4202.720918]  00000000001d4a40 ffff88020fe85a40 ffff88020fe85ab8 0000000000000001
[ 4202.720922] Call Trace:
[ 4202.720931]  [<ffffffff816a3cb9>] schedule+0x29/0x70
[ 4202.720950]  [<ffffffffa01ec48d>] btrfs_start_ordered_extent+0x6d/0x110 [btrfs]
[ 4202.720956]  [<ffffffff8108e620>] ? bit_waitqueue+0xc0/0xc0
[ 4202.720972]  [<ffffffffa01ec559>] btrfs_run_ordered_extent_work+0x29/0x40 [btrfs]
[ 4202.720988]  [<ffffffffa0201987>] normal_work_helper+0x137/0x2c0 [btrfs]
[ 4202.720994]  [<ffffffff810680e5>] process_one_work+0x1f5/0x530
(...)
[ 4202.721027] 2 locks held by kworker/u8:1/5450:
[ 4202.721028]  #0:  (%s-%s){++++..}, at: [<ffffffff81068083>] process_one_work+0x193/0x530
[ 4202.721037]  #1:  ((&work->normal_work)){+.+...}, at: [<ffffffff81068083>] process_one_work+0x193/0x530
[ 4202.721054] INFO: task btrfs:7891 blocked for more than 120 seconds.
[ 4202.721258]       Not tainted 3.13.0-fdm-btrfs-next-26+ #1
[ 4202.721444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4202.721699] btrfs           D 0000000000000001     0  7891   7890 0x00000001
[ 4202.721704]  ffff88018c2119e8 0000000000000086 ffff8800a33d2490 00000000001d3f40
[ 4202.721710]  ffff88018c211fd8 00000000001d3f40 ffff8802144b0000 ffff8800a33d2490
[ 4202.721714]  ffff8800d8576640 ffff88020fe85bc0 ffff88020fe85bc8 7fffffffffffffff
[ 4202.721718] Call Trace:
[ 4202.721723]  [<ffffffff816a3cb9>] schedule+0x29/0x70
[ 4202.721727]  [<ffffffff816a2ebc>] schedule_timeout+0x1dc/0x270
[ 4202.721732]  [<ffffffff8109bd79>] ? mark_held_locks+0xb9/0x140
[ 4202.721736]  [<ffffffff816a90c0>] ? _raw_spin_unlock_irq+0x30/0x40
[ 4202.721740]  [<ffffffff8109bf0d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
[ 4202.721744]  [<ffffffff816a488f>] wait_for_completion+0xdf/0x120
[ 4202.721749]  [<ffffffff8107fa90>] ? try_to_wake_up+0x310/0x310
[ 4202.721765]  [<ffffffffa01ebee4>] btrfs_wait_ordered_extents+0x1f4/0x280 [btrfs]
[ 4202.721781]  [<ffffffffa020526e>] btrfs_mksubvol.isra.62+0x30e/0x5a0 [btrfs]
[ 4202.721786]  [<ffffffff8108e620>] ? bit_waitqueue+0xc0/0xc0
[ 4202.721799]  [<ffffffffa02056a9>] btrfs_ioctl_snap_create_transid+0x1a9/0x1b0 [btrfs]
[ 4202.721813]  [<ffffffffa020583a>] btrfs_ioctl_snap_create_v2+0x10a/0x170 [btrfs]
(...)

It turns out that extent_io.c:__extent_writepage(), which ends up being called
through filemap_fdatawrite_range() in btrfs_start_ordered_extent(), was getting
-ENOSPC when calling the fill_delalloc callback. In this situation, it returned
without the writepage_end_io_hook callback (inode.c:btrfs_writepage_end_io_hook)
ever being called for the respective page, which prevents the ordered extent's
bytes_left count from ever reaching 0, and therefore a finish_ordered_fn work
is never queued into the endio_write_workers queue. This makes the task that
called btrfs_start_ordered_extent() hang forever on the wait queue of the ordered
extent.

This is fairly easy to reproduce using a small filesystem and fsstress on
a quad core vm:

    mkfs.btrfs -f -b `expr 2100 \* 1024 \* 1024` /dev/sdd
    mount /dev/sdd /mnt

    fsstress -p 6 -d /mnt -n 100000 -x \
        "btrfs subvolume snapshot -r /mnt /mnt/mysnap" \
	    -f allocsp=0 \
	    -f bulkstat=0 \
	    -f bulkstat1=0 \
	    -f chown=0 \
	    -f creat=1 \
	    -f dread=0 \
	    -f dwrite=0 \
	    -f fallocate=1 \
	    -f fdatasync=0 \
	    -f fiemap=0 \
	    -f freesp=0 \
	    -f fsync=0 \
	    -f getattr=0 \
	    -f getdents=0 \
	    -f link=0 \
	    -f mkdir=0 \
	    -f mknod=0 \
	    -f punch=1 \
	    -f read=0 \
	    -f readlink=0 \
	    -f rename=0 \
	    -f resvsp=0 \
	    -f rmdir=0 \
	    -f setxattr=0 \
	    -f stat=0 \
	    -f symlink=0 \
	    -f sync=0 \
	    -f truncate=1 \
	    -f unlink=0 \
	    -f unresvsp=0 \
	    -f write=4

So just ensure that if an error happens while writing the extent page
we call the writepage_end_io_hook callback. Also make it return the
error code and ensure the caller (extent_write_cache_pages) processes
all pages in the page vector even if an error happens only for some
of them, so that ordered extents end up released.

Signed-off-by: Filipe David Borba Manana <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.