Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bounds checking to zil_parse #16308

Merged
merged 1 commit into from
Aug 1, 2024
Merged

Conversation

XDTG
Copy link
Contributor

@XDTG XDTG commented Jun 28, 2024

Add bounds checking to zil_parse to ensure log records don't stray beyond valid memory region.

Motivation and Context

This change fixes the memory out of bounds issue mentioned in #16246.

Description

Before accessing the next log record header, check that there are enough bytes left to accommodate a log record. And ensure the lr->lrc_reclen is appropriate.

How Has This Been Tested?

An error is returned when using the crafted image mentioned in the #16246.

root@syzkaller:~# /root/zfs/zpool import -d /root/zfs-poc.img myzpool
cannot import 'myzpool': one or more devices is currently unavailable

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@XDTG XDTG closed this Jun 28, 2024
@XDTG XDTG reopened this Jun 28, 2024
@XDTG
Copy link
Contributor Author

XDTG commented Jun 28, 2024

Added a space to meet code style check.

@XDTG
Copy link
Contributor Author

XDTG commented Jun 28, 2024

Modified commit message to meet format requirements.

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally it looks good to me, except there is no EUCLEAR code on FreeBSD, and I am not sure how applicable it is to this case on Linux. When zil_read_log_block() does similar checks, it returns ECKSUM.

Another thought: ZFS relies on checksums to check data correct, and if they match, there should be endless number of ways to crash the system with maliciously crafted pool. I am not saying this patch is bad, but it is a drop in the ocean.

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thought: previously ASSERT()'s there would crash the system if built with debug, that could be used for testing quiet, while error return in this case may pass unnoticed. I am actually surprised that your pool import failed on testing, since at least checksum errors are normal completion for ZIL replay and is not fatal. Though may be it is only specific for ECKSUM.

@XDTG
Copy link
Contributor Author

XDTG commented Jun 28, 2024

Thanks for your time. @amotin

Generally it looks good to me, except there is no EUCLEAR code on FreeBSD, and I am not sure how applicable it is to this case on Linux. When zil_read_log_block() does similar checks, it returns ECKSUM.

Oh sorry, I'm not familiar with FreeBSD. I also consider EIO and ENOSPC before. I refer to similar patches in Linux, so I use UCLEAN here. What is your suggestion about the return code? Thanks.

I think these are two scenarios:

  1. zil_read_log_block checks if checksum and data match
  2. patch checks the memory out of bounds

Another thought: ZFS relies on checksums to check data correct, and if they match, there should be endless number of ways to crash the system with maliciously crafted pool. I am not saying this patch is bad, but it is a drop in the ocean.

From a security perspective, this is an attack surface no matter how many such crashes there are. So we need to add sanity checks.

One more thought: previously ASSERT()'s there would crash the system if built with debug, that could be used for testing quiet, while error return in this case may pass unnoticed.

Yes, I also noticed this issue before when writing the patch. There is not much useful information in the error message. So should we maybe keep ASSERT()?

I am actually surprised that your pool import failed on testing, since at least checksum errors are normal completion for ZIL replay and is not fatal. Though may be it is only specific for ECKSUM.

I see, zil_check_log_chain thinks ECKSUM is not a fatal error. But if it's a crash issue I think we need to get a fatal error.

@amotin
Copy link
Member

amotin commented Jun 28, 2024

Generally it looks good to me, except there is no EUCLEAR code on FreeBSD, and I am not sure how applicable it is to this case on Linux. When zil_read_log_block() does similar checks, it returns ECKSUM.

Oh sorry, I'm not familiar with FreeBSD. I also consider EIO and ENOSPC before. I refer to similar patches in Linux, so I use UCLEAN here. What is your suggestion about the return code? Thanks.

I'd use ECKSUM to make it non-error. It may be not very descriptive, but there is no much good in aborting pool import due to corrupted ZIL.

One more thought: previously ASSERT()'s there would crash the system if built with debug, that could be used for testing quiet, while error return in this case may pass unnoticed.

Yes, I also noticed this issue before when writing the patch. There is not much useful information in the error message. So should we maybe keep ASSERT()?

May be.

Make sure log record don't stray beyond valid memory region.

There is a lack of verification of the space occupied by fixed members
of lr_t in the zil_parse.

We can create a crafted image to trigger an out of bounds read by
following these steps:
    1) Do some file operations and reboot to simulate abnormal exit
       without umount
    2) zil_chain.zc_nused: 0x1000
    3) First lr_t
       lr_t.lrc_txtype: 0x0
       lr_t.lrc_reclen: 0x1000-0xb8-0x1
       lr_t.lrc_txg: 0x0
       lr_t.lrc_seq: 0x1
    4) Update checksum in zil_chain.zc_eck

Fix:
Add some checks to make sure the remaining bytes are large enough to
hold an log record.

Signed-off-by: XDTG <[email protected]>
Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmn_err() won't break CI, if happen somehow, but probably better than nothing.

@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Jul 31, 2024
@tonyhutter tonyhutter merged commit ec580bc into openzfs:master Aug 1, 2024
22 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants