Skip to content

Some performance enhancements#72

Merged
aliddell merged 41 commits intomainfrom
os-specific-file-sink
Mar 25, 2025
Merged

Some performance enhancements#72
aliddell merged 41 commits intomainfrom
os-specific-file-sink

Conversation

@aliddell
Copy link
Member

  • Uses platform-specific file I/O. Before the other interventions, I found this modestly improved performance of file I/O on all machines I tested on.
  • Precomputes several values which are often queried from ArrayDimension.
  • Uses OpenMP and wraps the main for loop in ArrayWriter::write_frame_to_chunks_() in a #pragma omp parallel for.

@aliddell
Copy link
Member Author

aliddell commented Mar 20, 2025

I still need to update the CI build and test scripts because the Mac (as per) needs some more TLC to get it right.

@aliddell aliddell force-pushed the os-specific-file-sink branch from 1377afa to 1f36f1d Compare March 21, 2025 15:04
@aliddell aliddell marked this pull request as draft March 21, 2025 18:07
@aliddell aliddell force-pushed the os-specific-file-sink branch from 73ecdc6 to 8bdf70d Compare March 21, 2025 22:36
Copy link
Member

@nclack nclack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Be sure to break the performance improvements into multiple pr's if you can.

size_t bytes_written = 0;
const auto n_tiles = n_tiles_x * n_tiles_y;

#pragma omp parallel for reduction(+ : bytes_written)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this with a benchmark, but this will scale better if you make bytes_written an atomic and a plain old parallel for.

Most of the work here is not involved with the reduction, and the way reductions get parallelized might be limiting.

Maybe omp figures this out for you, but it's better not to rely on that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually see significant improvement with the reduction.

  • atomic: 1.081 GiB/s
  • reduction: 1.354 GiB/s

I ran multiple times and saw similar numbers.

: file_(filename.data(),
truncate ? (std::ios::binary | std::ios::trunc) : std::ios::binary)
namespace {
#ifdef _WIN32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a way to factor out the platform-dependent things into their own source files, it makes the code much easier to read and maintain.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100%

#undef min
#endif

#ifdef max
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to just say #undef without needing the #ifdef.

More importantly, where are these symbols leaking in from? If possible you should handle this as close to the source as possible.

@aliddell aliddell force-pushed the os-specific-file-sink branch from 79bf22c to ce52121 Compare March 24, 2025 16:57
jeskesen
jeskesen previously approved these changes Mar 24, 2025
@aliddell aliddell marked this pull request as ready for review March 24, 2025 18:49
Copy link
Contributor

@jeskesen jeskesen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to remove my "approve", because I hit it accidentally before.

@jeskesen jeskesen self-requested a review March 24, 2025 18:57
@jeskesen jeskesen dismissed their stale review March 24, 2025 18:58

Accidental approval

Copy link
Collaborator

@shlomnissan shlomnissan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve flagged one potential bug — it looks like the wrong pointer might be getting deleted, so I’m requesting changes just to ensure that’s double-checked.

Beyond that, I think there’s a lot of room to improve the robustness and readability of the code. The manual pointer management, in particular, feels risky and could lead to maintenance issues. If possible, I'd recommend avoiding raw pointers in favor of safer patterns.

That said, the rest of my comments are just suggestions — feel free to weigh them against your judgment.

# Apple Silicon
set(LIBOMP_PATH "/opt/homebrew/opt/libomp")
set(CMAKE_OSX_ARCHITECTURES "arm64")
else ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be overkill since it handles 99% of cases, but if you want to be defensive, you could improve this by:

  • Making the architecture checks case-insensitive
  • Adding a fallback for unknown architectures (e.g. older or future variants)
if(CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "^[Aa][Rr][Mm]64")
    set(ARCH "arm64")
    set(LIBOMP_PATH "/opt/homebrew/opt/libomp")
elseif(CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "^[Xx]86_64")
    set(ARCH "x86_64")
    set(LIBOMP_PATH "/usr/local/opt/libomp")
else()
    message(WARNING "Unknown architecture: ${CMAKE_HOST_SYSTEM_PROCESSOR}")
    set(ARCH "${CMAKE_HOST_SYSTEM_PROCESSOR}")
endif()

endif ()

# OpenMP support
set(OpenMP_C_FLAGS "-Xclang -fopenmp -I${LIBOMP_PATH}/include")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is OpenMP being used in the C code as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at this point, but in case any is added down the line I don't want the configuration to break inexplicably for future users.

- name: Install vcpkg
run: |
git clone https://github.com/microsoft/vcpkg.git
git clone https://github.com/microsoft/vcpkg.git -b 2025.02.14
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity — what’s the reason for tagging vcpkg here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into an issue in CI on Friday afternoon which was introduced with this and apparently resolved with this. I decided to tag a known working version so this doesn't happen again.

./vcpkg integrate install
shell: bash

- name: Install OpenMP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend renaming this task to "Install OpenMP on macOS" to make the platform-specific purpose clearer. The same comment applies to all instances in this PR.

: file_(filename.data(),
truncate ? (std::ios::binary | std::ios::trunc) : std::ios::binary)
namespace {
#ifdef _WIN32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100%

@aliddell aliddell force-pushed the os-specific-file-sink branch from 248a378 to bf1bc86 Compare March 25, 2025 14:41
@aliddell
Copy link
Member Author

Thanks for comments @nclack @shlomnissan. I've made the following changes:

  • Renamed macos-toolchain.cmake to openmp.cmake.
  • Added instructions to the README for Mac users to install OpenMP with Homebrew.
  • Broken out Windows and POSIX implementations for file.sink.cpp.
  • Added a comment above #pragma omp parallel for describing why we only use 75% of the thread pool allotment for OMP parallelism. (Will make an issue for this once it's merged.)
  • Updated the interior of the parfor loop to have some more descriptive variable names.
    • Added some more verbiage to help with debugging in the case of buffer overrun.
  • Used auto and static_cast<T> a bit more liberally in the file I/O code.
  • Fixed the bug Shlomi found re: deleting the wrong pointer
  • Consolidated a few for loops in the ArrayDimension constructor.

@aliddell aliddell requested a review from shlomnissan March 25, 2025 15:48
Copy link
Collaborator

@shlomnissan shlomnissan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unblocking now that the bug is fixed.

@aliddell aliddell merged commit fafffc3 into main Mar 25, 2025
7 checks passed
@aliddell aliddell deleted the os-specific-file-sink branch March 25, 2025 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants