Skip to content

Commit

Permalink
Auto merge of #117329 - RalfJung:offset-by-zero, r=oli-obk,scottmcm
Browse files Browse the repository at this point in the history
offset: allow zero-byte offset on arbitrary pointers

As per prior `@rust-lang/opsem` [discussion](rust-lang/opsem-team#10) and [FCP](rust-lang/unsafe-code-guidelines#472 (comment)):

- Zero-sized reads and writes are allowed on all sufficiently aligned pointers, including the null pointer
- Inbounds-offset-by-zero is allowed on all pointers, including the null pointer
- `offset_from` on two pointers derived from the same allocation is always allowed when they have the same address

This removes surprising UB (in particular, even C++ allows "nullptr + 0", which we currently disallow), and it brings us one step closer to an important theoretical property for our semantics ("provenance monotonicity": if operations are valid on bytes without provenance, then adding provenance can't make them invalid).

The minimum LLVM we require (v17) includes https://reviews.llvm.org/D154051, so we can finally implement this.

The `offset_from` change is needed to maintain the equivalence with `offset`: if `let ptr2 = ptr1.offset(N)` is well-defined, then `ptr2.offset_from(ptr1)` should be well-defined and return N. Now consider the case where N is 0 and `ptr1` dangles: we want to still allow offset_from here.

I think we should change offset_from further, but that's a separate discussion.

Fixes #65108
[Tracking issue](#117945) | [T-lang summary](#117329 (comment))

Cc `@nikic`
  • Loading branch information
bors committed May 22, 2024
2 parents f0038a7 + 9526ce6 commit 5d328a1
Show file tree
Hide file tree
Showing 48 changed files with 202 additions and 476 deletions.
20 changes: 15 additions & 5 deletions compiler/rustc_const_eval/src/const_eval/machine.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ use rustc_target::spec::abi::Abi as CallAbi;
use crate::errors::{LongRunning, LongRunningWarn};
use crate::fluent_generated as fluent;
use crate::interpret::{
self, compile_time_machine, err_ub, throw_exhaust, throw_inval, throw_ub_custom,
self, compile_time_machine, err_ub, throw_exhaust, throw_inval, throw_ub_custom, throw_unsup,
throw_unsup_format, AllocId, AllocRange, ConstAllocation, CtfeProvenance, FnArg, FnVal, Frame,
ImmTy, InterpCx, InterpResult, MPlaceTy, OpTy, Pointer, PointerArithmetic, Scalar,
GlobalAlloc, ImmTy, InterpCx, InterpResult, MPlaceTy, OpTy, Pointer, PointerArithmetic, Scalar,
};

use super::error::*;
Expand Down Expand Up @@ -759,11 +759,21 @@ impl<'mir, 'tcx> interpret::Machine<'mir, 'tcx> for CompileTimeInterpreter<'mir,
ecx: &InterpCx<'mir, 'tcx, Self>,
alloc_id: AllocId,
) -> InterpResult<'tcx> {
// Check if this is the currently evaluated static.
if Some(alloc_id) == ecx.machine.static_root_ids.map(|(id, _)| id) {
Err(ConstEvalErrKind::RecursiveStatic.into())
} else {
Ok(())
return Err(ConstEvalErrKind::RecursiveStatic.into());
}
// If this is another static, make sure we fire off the query to detect cycles.
// But only do that when checks for static recursion are enabled.
if ecx.machine.static_root_ids.is_some() {
if let Some(GlobalAlloc::Static(def_id)) = ecx.tcx.try_get_global_alloc(alloc_id) {
if ecx.tcx.is_foreign_item(def_id) {
throw_unsup!(ExternStatic(def_id));
}
ecx.ctfe_query(|tcx| tcx.eval_static_initializer(def_id))?;
}
}
Ok(())
}
}

Expand Down
1 change: 1 addition & 0 deletions compiler/rustc_const_eval/src/interpret/intrinsics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,7 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {
name = intrinsic_name,
);
}
// This will always return 0.
(a, b)
}
(Err(_), _) | (_, Err(_)) => {
Expand Down
39 changes: 19 additions & 20 deletions compiler/rustc_const_eval/src/interpret/memory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -413,6 +413,8 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {
/// to the allocation it points to. Supports both shared and mutable references, as the actual
/// checking is offloaded to a helper closure.
///
/// `alloc_size` will only get called for non-zero-sized accesses.
///
/// Returns `None` if and only if the size is 0.
fn check_and_deref_ptr<T>(
&self,
Expand All @@ -425,18 +427,19 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {
M::ProvenanceExtra,
) -> InterpResult<'tcx, (Size, Align, T)>,
) -> InterpResult<'tcx, Option<T>> {
// Everything is okay with size 0.
if size.bytes() == 0 {
return Ok(None);
}

Ok(match self.ptr_try_get_alloc_id(ptr) {
Err(addr) => {
// We couldn't get a proper allocation. This is only okay if the access size is 0,
// and the address is not null.
if size.bytes() > 0 || addr == 0 {
throw_ub!(DanglingIntPointer(addr, msg));
}
None
// We couldn't get a proper allocation.
throw_ub!(DanglingIntPointer(addr, msg));
}
Ok((alloc_id, offset, prov)) => {
let (alloc_size, _alloc_align, ret_val) = alloc_size(alloc_id, offset, prov)?;
// Test bounds. This also ensures non-null.
// Test bounds.
// It is sufficient to check this for the end pointer. Also check for overflow!
if offset.checked_add(size, &self.tcx).map_or(true, |end| end > alloc_size) {
throw_ub!(PointerOutOfBounds {
Expand All @@ -447,14 +450,8 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {
msg,
})
}
// Ensure we never consider the null pointer dereferenceable.
if M::Provenance::OFFSET_IS_ADDR {
assert_ne!(ptr.addr(), Size::ZERO);
}

// We can still be zero-sized in this branch, in which case we have to
// return `None`.
if size.bytes() == 0 { None } else { Some(ret_val) }
Some(ret_val)
}
})
}
Expand Down Expand Up @@ -641,16 +638,18 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {
size,
CheckInAllocMsg::MemoryAccessTest,
|alloc_id, offset, prov| {
if !self.memory.validation_in_progress.get() {
// We want to call the hook on *all* accesses that involve an AllocId,
// including zero-sized accesses. That means we have to do it here
// rather than below in the `Some` branch.
M::before_alloc_read(self, alloc_id)?;
}
let alloc = self.get_alloc_raw(alloc_id)?;
Ok((alloc.size(), alloc.align, (alloc_id, offset, prov, alloc)))
},
)?;
// We want to call the hook on *all* accesses that involve an AllocId, including zero-sized
// accesses. That means we cannot rely on the closure above or the `Some` branch below. We
// do this after `check_and_deref_ptr` to ensure some basic sanity has already been checked.
if !self.memory.validation_in_progress.get() {
if let Ok((alloc_id, ..)) = self.ptr_try_get_alloc_id(ptr) {
M::before_alloc_read(self, alloc_id)?;
}
}

if let Some((alloc_id, offset, prov, alloc)) = ptr_and_alloc {
let range = alloc_range(offset, size);
Expand Down
12 changes: 10 additions & 2 deletions compiler/rustc_const_eval/src/interpret/validity.rs
Original file line number Diff line number Diff line change
Expand Up @@ -434,6 +434,11 @@ impl<'rt, 'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> ValidityVisitor<'rt, 'mir, '
found_bytes: has.bytes()
},
);
// Make sure this is non-null. We checked dereferenceability above, but if `size` is zero
// that does not imply non-null.
if self.ecx.scalar_may_be_null(Scalar::from_maybe_pointer(place.ptr(), self.ecx))? {
throw_validation_failure!(self.path, NullPtr { ptr_kind })
}
// Do not allow pointers to uninhabited types.
if place.layout.abi.is_uninhabited() {
let ty = place.layout.ty;
Expand All @@ -456,8 +461,8 @@ impl<'rt, 'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> ValidityVisitor<'rt, 'mir, '
// `!` is a ZST and we want to validate it.
if let Ok((alloc_id, _offset, _prov)) = self.ecx.ptr_try_get_alloc_id(place.ptr()) {
let mut skip_recursive_check = false;
let alloc_actual_mutbl = mutability(self.ecx, alloc_id);
if let GlobalAlloc::Static(did) = self.ecx.tcx.global_alloc(alloc_id) {
if let Some(GlobalAlloc::Static(did)) = self.ecx.tcx.try_get_global_alloc(alloc_id)
{
let DefKind::Static { nested, .. } = self.ecx.tcx.def_kind(did) else { bug!() };
// Special handling for pointers to statics (irrespective of their type).
assert!(!self.ecx.tcx.is_thread_local_static(did));
Expand Down Expand Up @@ -495,6 +500,7 @@ impl<'rt, 'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> ValidityVisitor<'rt, 'mir, '
// If this allocation has size zero, there is no actual mutability here.
let (size, _align, _alloc_kind) = self.ecx.get_alloc_info(alloc_id);
if size != Size::ZERO {
let alloc_actual_mutbl = mutability(self.ecx, alloc_id);
// Mutable pointer to immutable memory is no good.
if ptr_expected_mutbl == Mutability::Mut
&& alloc_actual_mutbl == Mutability::Not
Expand Down Expand Up @@ -831,6 +837,8 @@ impl<'rt, 'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> ValueVisitor<'mir, 'tcx, M>
trace!("visit_value: {:?}, {:?}", *op, op.layout);

// Check primitive types -- the leaves of our recursive descent.
// We assume that the Scalar validity range does not restrict these values
// any further than `try_visit_primitive` does!
if self.try_visit_primitive(op)? {
return Ok(());
}
Expand Down
10 changes: 5 additions & 5 deletions library/core/src/intrinsics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1483,10 +1483,10 @@ extern "rust-intrinsic" {
///
/// # Safety
///
/// Both the starting and resulting pointer must be either in bounds or one
/// byte past the end of an allocated object. If either pointer is out of
/// bounds or arithmetic overflow occurs then any further use of the
/// returned value will result in undefined behavior.
/// If the computed offset is non-zero, then both the starting and resulting pointer must be
/// either in bounds or at the end of an allocated object. If either pointer is out
/// of bounds or arithmetic overflow occurs then any further use of the returned value will
/// result in undefined behavior.
///
/// The stabilized version of this intrinsic is [`pointer::offset`].
#[must_use = "returns a new pointer rather than modifying its argument"]
Expand All @@ -1502,7 +1502,7 @@ extern "rust-intrinsic" {
/// # Safety
///
/// Unlike the `offset` intrinsic, this intrinsic does not restrict the
/// resulting pointer to point into or one byte past the end of an allocated
/// resulting pointer to point into or at the end of an allocated
/// object, and it wraps with two's complement arithmetic. The resulting
/// value is not necessarily valid to be used to actually access memory.
///
Expand Down
23 changes: 13 additions & 10 deletions library/core/src/ptr/const_ptr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -465,8 +465,9 @@ impl<T: ?Sized> *const T {
/// If any of the following conditions are violated, the result is Undefined
/// Behavior:
///
/// * Both the starting and resulting pointer must be either in bounds or one
/// byte past the end of the same [allocated object].
/// * If the computed offset, **in bytes**, is non-zero, then both the starting and resulting
/// pointer must be either in bounds or at the end of the same [allocated object].
/// (If it is zero, then the function is always well-defined.)
///
/// * The computed offset, **in bytes**, cannot overflow an `isize`.
///
Expand Down Expand Up @@ -676,11 +677,11 @@ impl<T: ?Sized> *const T {
/// If any of the following conditions are violated, the result is Undefined
/// Behavior:
///
/// * Both `self` and `origin` must be either in bounds or one
/// byte past the end of the same [allocated object].
/// * `self` and `origin` must either
///
/// * Both pointers must be *derived from* a pointer to the same object.
/// (See below for an example.)
/// * both be *derived from* a pointer to the same [allocated object], and the memory range between
/// the two pointers must be either empty or in bounds of that object. (See below for an example.)
/// * or both be derived from an integer literal/constant, and point to the same address.
///
/// * The distance between the pointers, in bytes, must be an exact multiple
/// of the size of `T`.
Expand Down Expand Up @@ -951,8 +952,9 @@ impl<T: ?Sized> *const T {
/// If any of the following conditions are violated, the result is Undefined
/// Behavior:
///
/// * Both the starting and resulting pointer must be either in bounds or one
/// byte past the end of the same [allocated object].
/// * If the computed offset, **in bytes**, is non-zero, then both the starting and resulting
/// pointer must be either in bounds or at the end of the same [allocated object].
/// (If it is zero, then the function is always well-defined.)
///
/// * The computed offset, **in bytes**, cannot overflow an `isize`.
///
Expand Down Expand Up @@ -1035,8 +1037,9 @@ impl<T: ?Sized> *const T {
/// If any of the following conditions are violated, the result is Undefined
/// Behavior:
///
/// * Both the starting and resulting pointer must be either in bounds or one
/// byte past the end of the same [allocated object].
/// * If the computed offset, **in bytes**, is non-zero, then both the starting and resulting
/// pointer must be either in bounds or at the end of the same [allocated object].
/// (If it is zero, then the function is always well-defined.)
///
/// * The computed offset cannot exceed `isize::MAX` **bytes**.
///
Expand Down
11 changes: 3 additions & 8 deletions library/core/src/ptr/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,13 @@
//! The precise rules for validity are not determined yet. The guarantees that are
//! provided at this point are very minimal:
//!
//! * A [null] pointer is *never* valid, not even for accesses of [size zero][zst].
//! * For operations of [size zero][zst], *every* pointer is valid, including the [null] pointer.
//! The following points are only concerned with non-zero-sized accesses.
//! * A [null] pointer is *never* valid.
//! * For a pointer to be valid, it is necessary, but not always sufficient, that the pointer
//! be *dereferenceable*: the memory range of the given size starting at the pointer must all be
//! within the bounds of a single allocated object. Note that in Rust,
//! every (stack-allocated) variable is considered a separate allocated object.
//! * Even for operations of [size zero][zst], the pointer must not be pointing to deallocated
//! memory, i.e., deallocation makes pointers invalid even for zero-sized operations. However,
//! casting any non-zero integer *literal* to a pointer is valid for zero-sized accesses, even if
//! some memory happens to exist at that address and gets deallocated. This corresponds to writing
//! your own allocator: allocating zero-sized objects is not very hard. The canonical way to
//! obtain a pointer that is valid for zero-sized accesses is [`NonNull::dangling`].
//FIXME: mention `ptr::dangling` above, once it is stable.
//! * All accesses performed by functions in this module are *non-atomic* in the sense
//! of [atomic operations] used to synchronize between threads. This means it is
//! undefined behavior to perform two concurrent accesses to the same location from different
Expand Down
23 changes: 13 additions & 10 deletions library/core/src/ptr/mut_ptr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -480,8 +480,9 @@ impl<T: ?Sized> *mut T {
/// If any of the following conditions are violated, the result is Undefined
/// Behavior:
///
/// * Both the starting and resulting pointer must be either in bounds or one
/// byte past the end of the same [allocated object].
/// * If the computed offset, **in bytes**, is non-zero, then both the starting and resulting
/// pointer must be either in bounds or at the end of the same [allocated object].
/// (If it is zero, then the function is always well-defined.)
///
/// * The computed offset, **in bytes**, cannot overflow an `isize`.
///
Expand Down Expand Up @@ -904,11 +905,11 @@ impl<T: ?Sized> *mut T {
/// If any of the following conditions are violated, the result is Undefined
/// Behavior:
///
/// * Both `self` and `origin` must be either in bounds or one
/// byte past the end of the same [allocated object].
/// * `self` and `origin` must either
///
/// * Both pointers must be *derived from* a pointer to the same object.
/// (See below for an example.)
/// * both be *derived from* a pointer to the same [allocated object], and the memory range between
/// the two pointers must be either empty or in bounds of that object. (See below for an example.)
/// * or both be derived from an integer literal/constant, and point to the same address.
///
/// * The distance between the pointers, in bytes, must be an exact multiple
/// of the size of `T`.
Expand Down Expand Up @@ -1095,8 +1096,9 @@ impl<T: ?Sized> *mut T {
/// If any of the following conditions are violated, the result is Undefined
/// Behavior:
///
/// * Both the starting and resulting pointer must be either in bounds or one
/// byte past the end of the same [allocated object].
/// * If the computed offset, **in bytes**, is non-zero, then both the starting and resulting
/// pointer must be either in bounds or at the end of the same [allocated object].
/// (If it is zero, then the function is always well-defined.)
///
/// * The computed offset, **in bytes**, cannot overflow an `isize`.
///
Expand Down Expand Up @@ -1179,8 +1181,9 @@ impl<T: ?Sized> *mut T {
/// If any of the following conditions are violated, the result is Undefined
/// Behavior:
///
/// * Both the starting and resulting pointer must be either in bounds or one
/// byte past the end of the same [allocated object].
/// * If the computed offset, **in bytes**, is non-zero, then both the starting and resulting
/// pointer must be either in bounds or at the end of the same [allocated object].
/// (If it is zero, then the function is always well-defined.)
///
/// * The computed offset cannot exceed `isize::MAX` **bytes**.
///
Expand Down
10 changes: 0 additions & 10 deletions src/tools/miri/tests/fail/dangling_pointers/dangling_zst_deref.rs

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Loading

0 comments on commit 5d328a1

Please sign in to comment.