Conversation
* Add `sccache` integration with `build_velox.sh` * Document how to setup and use `sccache`
sccache: Integrate with Velox buildsccache: integrate with Velox build
Remove non-existent temporary directory removal.
| First, set up authentication credentials: | ||
| ```bash | ||
| cd velox-testing/velox/scripts | ||
| ./setup_sccache_auth.sh |
There was a problem hiding this comment.
Running this script results in the following error:
ERROR: failed to build: failed to solve: failed to read dockerfile: open sccache_auth.dockerfile: no such file or directory
There was a problem hiding this comment.
Oh oops I think I forgot to include a dockerfile in this PR, my bad.
velox/scripts/build_velox.sh
Outdated
| sccache Usage: | ||
| The --sccache option enables distributed compilation caching using the RAPIDS sccache infrastructure. | ||
|
|
||
| Setup authentication first: | ||
| ./setup_sccache_auth.sh [output_dir] | ||
|
|
||
| Then use with build: | ||
| --sccache --sccache-auth-dir ~/.sccache-auth | ||
|
|
||
| Or set environment variable: | ||
| export SCCACHE_AUTH_DIR=~/.sccache-auth | ||
| ./build_velox.sh --sccache |
There was a problem hiding this comment.
Nit: Is this duplicating information in the README?
velox/scripts/build_velox.sh
Outdated
| $(basename "$0") --sccache --sccache-github-token ghp_xxx # Full distributed compilation | ||
| $(basename "$0") --sccache --sccache-show-stats # Build with sccache and show stats |
There was a problem hiding this comment.
Are --sccache-github-token and --sccache-show-stats supported options?
There was a problem hiding this comment.
Sorry these were vestigial option I had at one point and forgot to remove from docs.
| # Add sccache build arguments | ||
| if [[ "$ENABLE_SCCACHE" == true ]]; then | ||
| DOCKER_BUILD_OPTS+=(--build-arg ENABLE_SCCACHE="ON") | ||
| # Copy auth files to build context |
There was a problem hiding this comment.
Should this be cleaned up later?
There was a problem hiding this comment.
Sure, I put in a trap.
| mkdir -p ~/.config/sccache ~/.aws | ||
|
|
||
| # Install AWS credentials | ||
| cp /sccache_auth/aws_credentials ~/.aws/credentials |
There was a problem hiding this comment.
Will this override existing credential setup?
There was a problem hiding this comment.
Yes, there is a warning put in place to notify the user that this will happen here: https://github.com/rapidsai/velox-testing/pull/52/files/2cb983596eb88b719f86819ee4e6b88fc55dd1f2#diff-d2d6851dcf7cc303d86898bf66d55ab402ed49efa26304f2940a8d1218519026R28-R34
There was a problem hiding this comment.
This also only affects the ~/.aws/credentials in the docker container that is used to build velox, it does not affect the host.
| else | ||
| echo "Error: Distributed compilation not available, check connectivity" | ||
| exit 1 | ||
| fi No newline at end of file |
There was a problem hiding this comment.
Nit: Add newline to the end of file.
velox/scripts/setup_sccache_auth.sh
Outdated
| echo " ./build_velox.sh --sccache" | ||
| echo | ||
|
|
||
| echo -e "${GREEN}Setup complete!${NC}" No newline at end of file |
There was a problem hiding this comment.
Nit: Add newline to the end of file.
* Added missing dockerfile which was omitted in first commit
| echo "AWS credentials file preview:" | ||
| head -3 ~/.aws/credentials |
There was a problem hiding this comment.
Should we be printing credentials?
Similarly, should this script be enabling tracing (i.e. set -x) by default?
There was a problem hiding this comment.
Hmm I'm not sure why this is showing in GH, this section was deleted a while ago, please double check.
I've disabled set -x for this script now.
velox/scripts/build_velox.sh
Outdated
| -j|--num-threads NUM Number of threads to use for building (default: 3/4 of CPU cores). | ||
| --benchmarks true|false Enable benchmarks and nsys profiling tools (default: true). | ||
| --sccache Enable sccache distributed compilation caching. | ||
| --sccache-auth-dir DIR Directory containing sccache authentication files (github_token, aws_credentials). |
There was a problem hiding this comment.
Does this have to be an argument given that this directory is created by the setup script?
There was a problem hiding this comment.
I can remove this option and we can just rely on the relevant environment variable.
README.md
Outdated
|
|
||
| Then build Velox with sccache enabled: | ||
| ```bash | ||
| ./build_velox.sh --sccache --sccache-auth-dir ~/.sccache-auth |
There was a problem hiding this comment.
Running this results in the following error:
=> [stage-0 9/9] RUN --mount=type=bind,source=velox,target=/workspace/velox,ro set -euxo pipefail && if [ "ON" = "ON" ]; then 542.7s
=> => # ed-qualifiers -Wno-implicit-fallthrough -Wno-class-memaccess -Wno-comment -Wno-int-in-bool-context
=> => # -Wno-redundant-move -Wno-array-bounds -Wno-maybe-uninitialized -Wno-unused-result -Wno-for
=> => # mat-overflow -Wno-strict-aliasing -Wno-restrict -Werror -O3 -DNDEBUG -std=gnu++20 -fPIC -fdiagnostics-color=always -ffp-contr
=> => # act=off -fPIC -MD -MT velox/buffer/CMakeFiles/velox.dir/__/dwio/dwrf/writer/ColumnWriter.cpp.o -MF velox/buffer/CMakeFiles/velox.dir/_
=> => # _/dwio/dwrf/writer/ColumnWriter.cpp.o.d -o velox/buffer/CMakeFiles/velox.dir/__/dwio/dwrf/writer/ColumnWriter.cpp.o -c /workspace/velo
=> => # x/velox/dwio/dwrf/writer/ColumnWriter.cpp
failed to execute bake: exit status 1
There was a problem hiding this comment.
This looks like a compilation error unrelated to this PR, can you post more of the logs to confirm?
There was a problem hiding this comment.
Build logs have been attached. This is using this velox commit: facebookincubator/velox@8875b4d. Note that the build succeeds when sccache is not used.
./build_velox.sh --no-cache --log velox_build.log
velox_build.log
./build_velox.sh --no-cache --sccache --sccache-auth-dir ~/.sccache-auth --log velox_build_sccache.log
velox_build_sccache.log
There was a problem hiding this comment.
@paul-aiyedun Hmm, I've been able to build that commit (facebookincubator/velox@8875b4d) using the command
./build_velox.sh --no-cache --sccache --sccache-auth-dir ~/.sccache-auth --log velox_build_sccache.log successfully.
Build log:
velox_build_sccache.log
Could you please retry with a clean checkout/env to further debug this?
There was a problem hiding this comment.
After several attempts I was able to reproduce the issue above, and it looks like when distributed compilation is enabled with sccache it can compile with warnings that otherwise would not be hit. This causes the build to fail due to treating warnings as errors.
To address this, I've made distributed compilation an opt in flag, and provide a warning that observable behaviour of the compilers may differ if enabled (resulting in failed compilation.)
Do not enable tracing (`set -x`).
…sting into mattgara/sccache-velox
velox/scripts/build_velox.sh
Outdated
| exit 1 | ||
| fi | ||
| SCCACHE_AUTH_DIR="$2" | ||
| shift 2 |
There was a problem hiding this comment.
Oops this looks like a merge error by the automated merge in Cursor/VS Code.
Add missing ;; lost in automated merge of conflict.
There was a problem hiding this comment.
Is this dockerfile based on an existing dockerfile or documentation?
There was a problem hiding this comment.
This dockerfile is largely based on documentation in a slack channel which I can link offline.
velox/scripts/build_velox.sh
Outdated
| # Cleanup function to remove copied sccache auth files | ||
| cleanup_sccache_auth() { | ||
| if [[ "$ENABLE_SCCACHE" == true && -d "../docker/sccache/sccache_auth/" ]]; then | ||
| rm -f ../docker/sccache/sccache_auth/github_token ../docker/sccache/sccache_auth/aws_credentials |
There was a problem hiding this comment.
Nit: Delete the ../docker/sccache/sccache_auth directory?
082ca6e to
f7eb417
Compare
* Add flag to enable it. * PR revision
* PR revision
velox/scripts/build_velox.sh
Outdated
| TREAT_WARNINGS_AS_ERRORS="${TREAT_WARNINGS_AS_ERRORS:-1}" | ||
| LOGFILE="./build_velox.log" | ||
| ENABLE_SCCACHE=false | ||
| SCCACHE_AUTH_DIR="$HOME/.sccache-auth" |
There was a problem hiding this comment.
Make this to accept value from SCCACHE_AUTH_DIR env var or fallback to default "$HOME/.sccache-auth"
There was a problem hiding this comment.
Good point, thanks for the catch.
.github/workflows/velox-test.yml
Outdated
| cat > $HOME/.sccache-auth/aws_credentials << EOF | ||
| [default] | ||
| aws_access_key_id = $SCCACHE_AWS_ACCESS_KEY_ID | ||
| aws_secret_access_key = $SCCACHE_AWS_SECRET_ACCESS_KEY |
There was a problem hiding this comment.
Uses the default AWS credentials available through GHA secrets instead of generating them every time. The same applies for the GitHub token.
There was a problem hiding this comment.
I don't think we want to make these changes in this PR, enabling sccache for CI is a separate work item which will have a follow-up PR.
|
Relevant CI run., can we fix the CI before merging? |
See above. In summary, I think this is a separate work item that should not block this PR. |
Add
sccacheintegration withbuild_velox.shDocument how to setup and use
sccache