Skip to content

Conversation

@yinggeh
Copy link
Contributor

@yinggeh yinggeh commented Aug 6, 2025

What does the PR do?

When calling mmap system call, the returned starting address is starting at the shm register offset (must be multiples of page size) already. Later the register offset is added to the starting address again and raises out-of-bound write issue.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • fix

Related PRs:

Where should the reviewer start?

Test plan:

L0_shared_memory
L0_cuda_shared_memory

  • CI Pipeline ID:
    32791300

Caveats:

Background

https://man7.org/linux/man-pages/man2/mmap.2.html

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@yinggeh yinggeh self-assigned this Aug 6, 2025
@yinggeh yinggeh added the bug Something isn't working label Aug 6, 2025
@yinggeh yinggeh requested a review from Copilot August 6, 2025 18:17
@yinggeh yinggeh changed the title fix: Proper handling of system shm starting address fix: Proper handling of system shm register offset Aug 6, 2025

This comment was marked as outdated.

@yinggeh yinggeh requested a review from Copilot August 6, 2025 18:29

This comment was marked as outdated.

@yinggeh yinggeh requested a review from Copilot August 6, 2025 18:45
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug in shared memory handling where the register offset was being applied twice during memory mapping operations, leading to out-of-bounds read issues. The fix ensures that when mmap returns a starting address that already includes the register offset, the offset is not added again during memory access.

Key changes:

  • Removes duplicate register offset calculation in the C++ shared memory manager
  • Updates test utilities to properly handle register offsets for different memory types
  • Adds comprehensive tests for large shared memory register offsets across multiple backend types

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/shared_memory_manager.cc Removes duplicate register offset calculation for CPU memory types
qa/common/infer_util.py Updates test utility to handle register offsets and support multiple model backends
qa/L0_shared_memory/test.sh Adds test infrastructure for large shared memory register offset testing
qa/L0_shared_memory/shared_memory_test.py Implements comprehensive test for large register offsets across backend types

@yinggeh yinggeh requested a review from tanmayv25 August 6, 2025 18:48
*shm_info = std::static_pointer_cast<const SharedMemoryInfo>(it->second);
}

if (it->second->kind_ == TRITONSERVER_MEMORY_CPU) {
Copy link
Contributor Author

@yinggeh yinggeh Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System shm is treated differently from CUDA shm. mmap returns the address starting at the offset. Therefore no need to add another offset to it.

@yinggeh yinggeh merged commit a4d1897 into main Aug 7, 2025
3 checks passed
@yinggeh yinggeh deleted the yinggeh-DLIS-8400-fix-shm-offset branch August 7, 2025 11:43
mc-nv added a commit that referenced this pull request Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Development

Successfully merging this pull request may close these issues.

3 participants