Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,25 @@ A Docker-based build infrastructure has been added to facilitate building Velox

Specifically, the `velox-testing` and `velox` repositories must be checked out as sibling directories under the same parent directory. Once that is done, navigate (`cd`) into the `velox-testing/velox/scripts` directory and execute the build script `build_velox.sh`. After a successful build, the Velox libraries and executables are available in the container at `/opt/velox-build/release`.

## `sccache` Usage
`sccache` has been integrated to significantly accelerate builds by using a build server to share cached compiled objects during the build process. Currently it is only supported for Velox builds (not Presto.)

The fork `rapidsai/sccache` is integrated and is currently configured for use with the `NVIDIA` GitHub organization.

### Setup and Usage
First, set up authentication credentials:
```bash
cd velox-testing/velox/scripts
./setup_sccache_auth.sh

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this script results in the following error:

ERROR: failed to build: failed to solve: failed to read dockerfile: open sccache_auth.dockerfile: no such file or directory

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh oops I think I forgot to include a dockerfile in this PR, my bad.

```

Then build Velox with sccache enabled:
```bash
./build_velox.sh --sccache --sccache-auth-dir ~/.sccache-auth

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this results in the following error:

 => [stage-0 9/9] RUN --mount=type=bind,source=velox,target=/workspace/velox,ro     set -euxo pipefail &&     if [ "ON" = "ON" ]; then   542.7s
 => => # ed-qualifiers        -Wno-implicit-fallthrough          -Wno-class-memaccess          -Wno-comment          -Wno-int-in-bool-context  
 => => #         -Wno-redundant-move          -Wno-array-bounds          -Wno-maybe-uninitialized          -Wno-unused-result          -Wno-for
 => => # mat-overflow          -Wno-strict-aliasing -Wno-restrict -Werror -O3 -DNDEBUG -std=gnu++20 -fPIC -fdiagnostics-color=always -ffp-contr
 => => # act=off -fPIC -MD -MT velox/buffer/CMakeFiles/velox.dir/__/dwio/dwrf/writer/ColumnWriter.cpp.o -MF velox/buffer/CMakeFiles/velox.dir/_
 => => # _/dwio/dwrf/writer/ColumnWriter.cpp.o.d -o velox/buffer/CMakeFiles/velox.dir/__/dwio/dwrf/writer/ColumnWriter.cpp.o -c /workspace/velo
 => => # x/velox/dwio/dwrf/writer/ColumnWriter.cpp                                                                                             
failed to execute bake: exit status 1

@mattgara mattgara Sep 24, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a compilation error unrelated to this PR, can you post more of the logs to confirm?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build logs have been attached. This is using this velox commit: facebookincubator/velox@8875b4d. Note that the build succeeds when sccache is not used.

./build_velox.sh --no-cache --log velox_build.log
velox_build.log

./build_velox.sh --no-cache --sccache --sccache-auth-dir ~/.sccache-auth --log velox_build_sccache.log
velox_build_sccache.log

@mattgara mattgara Oct 7, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paul-aiyedun Hmm, I've been able to build that commit (facebookincubator/velox@8875b4d) using the command

./build_velox.sh --no-cache --sccache --sccache-auth-dir ~/.sccache-auth --log velox_build_sccache.log successfully.

Build log:
velox_build_sccache.log

Could you please retry with a clean checkout/env to further debug this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After several attempts I was able to reproduce the issue above, and it looks like when distributed compilation is enabled with sccache it can compile with warnings that otherwise would not be hit. This causes the build to fail due to treating warnings as errors.

To address this, I've made distributed compilation an opt in flag, and provide a warning that observable behaviour of the compilers may differ if enabled (resulting in failed compilation.)

```

The authentication setup only needs to be done once and credentials are valid for 12 hours.

## Velox Benchmarking
A Docker-based benchmarking infrastructure has been added to facilitate running Velox benchmarks with support for CPU/GPU execution engines and profiling capabilities. The infrastructure uses a dedicated `velox-benchmark` Docker service with pre-configured volume mounts that automatically sync benchmark data and results. The data follows Hive directory structure, making it compatible with Presto. Currently, only TPC-H is implemented, but the infrastructure is designed to be easily extended to support additional benchmarks in the future.

Expand Down
46 changes: 42 additions & 4 deletions velox/docker/adapters_build.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ ARG TREAT_WARNINGS_AS_ERRORS=1
ARG VELOX_ENABLE_BENCHMARKS=ON
ARG BUILD_BASE_DIR=/opt/velox-build
ARG BUILD_TYPE=release
ARG ENABLE_SCCACHE=OFF

# Environment mirroring upstream CI defaults and incorporating build args
ENV VELOX_DEPENDENCY_SOURCE=SYSTEM \
Expand Down Expand Up @@ -43,13 +44,27 @@ ${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/cudf-build:\
${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/rmm-build:\
${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/rapids_logger-build:\
${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/kvikio-build:\
${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/nvcomp_proprietary_binary-src/lib64"
${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/nvcomp_proprietary_binary-src/lib64" \
ENABLE_SCCACHE=${ENABLE_SCCACHE}

WORKDIR /workspace/velox

# Print environment variables for debugging
RUN printenv | sort

# Install sccache if enabled
RUN if [ "$ENABLE_SCCACHE" = "ON" ]; then \
set -euxo pipefail && \
# Install RAPIDS sccache fork
wget --no-hsts -q -O- "https://github.com/rapidsai/sccache/releases/download/v0.10.0-rapids.68/sccache-v0.10.0-rapids.68-$(uname -m)-unknown-linux-musl.tar.gz" | \
tar -C /usr/bin -zf - --wildcards --strip-components=1 -x '*/sccache' 2>/dev/null && \
chmod +x /usr/bin/sccache && \
# Verify installation
sccache --version; \
else \
echo "Skipping sccache installation (ENABLE_SCCACHE=OFF)"; \
fi

# Install NVIDIA Nsight Systems (nsys) for profiling - only if benchmarks are enabled
RUN if [ "$VELOX_ENABLE_BENCHMARKS" = "ON" ]; then \
set -euxo pipefail && \
Expand All @@ -66,8 +81,31 @@ RUN if [ "$VELOX_ENABLE_BENCHMARKS" = "ON" ]; then \
echo "Skipping nsys installation (VELOX_ENABLE_BENCHMARKS=OFF)"; \
fi

# Build using the specified build type and directory
# Copy sccache setup script (if sccache enabled)
COPY velox-testing/velox/docker/sccache/sccache_setup.sh /sccache_setup.sh
RUN if [ "$ENABLE_SCCACHE" = "ON" ]; then chmod +x /sccache_setup.sh; fi

# Copy sccache auth files (note source of copy must be within the docker build context)
COPY velox-testing/velox/docker/sccache/sccache_auth/ /sccache_auth/

# Build in Release mode into ${BUILD_BASE_DIR}
RUN --mount=type=bind,source=velox,target=/workspace/velox,ro \
set -euxo pipefail && \
make cmake BUILD_DIR="${BUILD_TYPE}" BUILD_TYPE="${BUILD_TYPE}" EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS[*]}" BUILD_BASE_DIR="${BUILD_BASE_DIR}" && \
make build BUILD_DIR="${BUILD_TYPE}" BUILD_BASE_DIR="${BUILD_BASE_DIR}"
# Configure sccache if enabled
if [ "$ENABLE_SCCACHE" = "ON" ]; then \
# Run sccache setup script
/sccache_setup.sh && \
# Add sccache CMake flags
EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS} -DCMAKE_C_COMPILER_LAUNCHER=sccache -DCMAKE_CXX_COMPILER_LAUNCHER=sccache -DCMAKE_CUDA_COMPILER_LAUNCHER=sccache" && \
echo "sccache distributed status:" && \
sccache --dist-status && \
echo "Pre-build sccache (zeroed out) statistics:" && \
sccache --show-stats; \
fi && \
make cmake BUILD_DIR="${BUILD_TYPE}" BUILD_TYPE="${BUILD_TYPE}" EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS}" BUILD_BASE_DIR="${BUILD_BASE_DIR}" && \
make build BUILD_DIR="${BUILD_TYPE}" BUILD_BASE_DIR="${BUILD_BASE_DIR}" && \
# Show final sccache stats if enabled
if [ "$ENABLE_SCCACHE" = "ON" ]; then \
echo "Post-build sccache statistics:" && \
sccache --show-stats; \
fi
63 changes: 63 additions & 0 deletions velox/docker/sccache/sccache_setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#!/bin/bash
set -euo pipefail

# Check for required auth files
if [[ ! -f /sccache_auth/github_token ]]; then
echo "ERROR: GitHub token not found at /sccache_auth/github_token"
exit 1
fi

if [[ ! -f /sccache_auth/aws_credentials ]]; then
echo "ERROR: AWS credentials not found at /sccache_auth/aws_credentials"
exit 1
fi

# Set up directories
mkdir -p ~/.config/sccache ~/.aws

# Install AWS credentials
cp /sccache_auth/aws_credentials ~/.aws/credentials

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this override existing credential setup?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also only affects the ~/.aws/credentials in the docker container that is used to build velox, it does not affect the host.


# Read GitHub token
GITHUB_TOKEN=$(cat /sccache_auth/github_token | tr -d '\n\r ')

# Create sccache config
SCCACHE_ARCH=$(uname -m | sed 's/x86_64/amd64/')

cat > ~/.config/sccache/config << SCCACHE_EOF
[cache.disk]
size = 107374182400

[cache.disk.preprocessor_cache_mode]
use_preprocessor_cache_mode = true

[cache.s3]
bucket = "rapids-sccache-devs"
region = "us-east-2"
no_credentials = false

[dist]
scheduler_url = "https://${SCCACHE_ARCH}.linux.sccache.rapids.nvidia.com"

[dist.auth]
type = "token"
token = "${GITHUB_TOKEN}"
SCCACHE_EOF

# Configure sccache for high parallelism
# Increase file descriptor limit for high parallelism (if possible)
ulimit -n $(ulimit -Hn) || echo "Could not increase file descriptor limit"

# Start sccache server
sccache --start-server

# Test sccache
sccache --show-stats

# Testing distributed compilation status
if sccache --dist-status; then
echo "Distributed compilation is available"
else
echo "Error: Distributed compilation not available, check connectivity"
exit 1
fi
51 changes: 51 additions & 0 deletions velox/docker/sccache_auth.dockerfile

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this dockerfile based on an existing dockerfile or documentation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dockerfile is largely based on documentation in a slack channel which I can link offline.

Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
FROM ubuntu:22.04

# Prevent interactive prompts during package installation
ENV DEBIAN_FRONTEND=noninteractive

# Install basic dependencies
RUN <<EOF
apt-get update && apt-get install -y \
curl \
wget \
ca-certificates \
gnupg \
lsb-release \
&& rm -rf /var/lib/apt/lists/*
EOF

# Install GitHub CLI
RUN <<EOF
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg
chmod go+r /usr/share/keyrings/githubcli-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | tee /etc/apt/sources.list.d/github-cli.list > /dev/null
apt-get update
apt-get install gh -y
rm -rf /var/lib/apt/lists/*
EOF

# Install gh-nv-gha-aws plugin manually
RUN <<EOF
NV_GHA_AWS_VERSION="0.1.1"
ARCH=$(dpkg --print-architecture)
if [ "$ARCH" = "amd64" ]; then ARCH="amd64"; elif [ "$ARCH" = "arm64" ]; then ARCH="arm64"; fi
mkdir -p /root/.local/share/gh/extensions/gh-nv-gha-aws
wget --no-hsts -q -O /root/.local/share/gh/extensions/gh-nv-gha-aws/gh-nv-gha-aws \
"https://github.com/nv-gha-runners/gh-nv-gha-aws/releases/download/v${NV_GHA_AWS_VERSION}/gh-nv-gha-aws_v${NV_GHA_AWS_VERSION}_linux-${ARCH}"
chmod 0755 /root/.local/share/gh/extensions/gh-nv-gha-aws/gh-nv-gha-aws
EOF

# Create plugin manifest
RUN <<EOF
cat > /root/.local/share/gh/extensions/gh-nv-gha-aws/manifest.yml << 'MANIFEST'
owner: nv-gha-runners
name: gh-nv-gha-aws
host: github.com
tag: v0.1.1
ispinned: false
path: $HOME/.local/share/gh/extensions/gh-nv-gha-aws/gh-nv-gha-aws
MANIFEST
EOF

# Create output directory for credentials
RUN mkdir -p /output
74 changes: 72 additions & 2 deletions velox/scripts/build_velox.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,17 @@ BUILD_TYPE="release"
LOG_ENABLED=false
TREAT_WARNINGS_AS_ERRORS="${TREAT_WARNINGS_AS_ERRORS:-1}"
LOGFILE="./build_velox.log"
ENABLE_SCCACHE=false
SCCACHE_AUTH_DIR=""

# Cleanup function to remove copied sccache auth files
cleanup_sccache_auth() {
if [[ "$ENABLE_SCCACHE" == true && -d "../docker/sccache/sccache_auth/" ]]; then
rm -f ../docker/sccache/sccache_auth/github_token ../docker/sccache/sccache_auth/aws_credentials

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Delete the ../docker/sccache/sccache_auth directory?

fi
}

trap cleanup_sccache_auth EXIT SIGTERM SIGINT SIGQUIT

print_help() {
cat <<EOF
Expand All @@ -43,6 +54,8 @@ Options:
--gpu Build with GPU support (enables CUDF; sets BUILD_WITH_VELOX_ENABLE_CUDF=ON) [default].
-j|--num-threads NUM Number of threads to use for building (default: 3/4 of CPU cores).
--benchmarks true|false Enable benchmarks and nsys profiling tools (default: true).
--sccache Enable sccache distributed compilation caching.
--sccache-auth-dir DIR Directory containing sccache authentication files (github_token, aws_credentials).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be an argument given that this directory is created by the setup script?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove this option and we can just rely on the relevant environment variable.

--build-type TYPE Build type: Release, Debug, or RelWithDebInfo (case insensitive, default: release).
-h, --help Show this help message and exit.

Expand All @@ -57,6 +70,7 @@ Examples:
$(basename "$0") --log mybuild.log --all-cuda-archs
$(basename "$0") -j 8 --gpu
$(basename "$0") --num-threads 16 --no-cache
$(basename "$0") --sccache --sccache-auth-dir /auth_dir/ # Build with sccache and use auth files in /auth_dir/
$(basename "$0") --build-type Debug
$(basename "$0") --build-type debug --gpu
$(basename "$0") --build-type RELWITHDEBINFO --gpu
Expand Down Expand Up @@ -127,6 +141,18 @@ parse_args() {
exit 1
fi
;;
--sccache)
ENABLE_SCCACHE=true
shift
;;
--sccache-auth-dir)
if [[ -z "${2:-}" || "${2}" =~ ^- ]]; then
echo "Error: --sccache-auth-dir requires a directory path"
exit 1
fi
SCCACHE_AUTH_DIR="$2"
shift 2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

;; needed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops this looks like a merge error by the automated merge in Cursor/VS Code.

;;
--build-type)
if [[ -n "${2:-}" && ! "${2}" =~ ^- ]]; then
# Convert to lowercase first, then validate
Expand Down Expand Up @@ -159,6 +185,30 @@ parse_args() {
done
}

# Validate sccache authentication
validate_sccache_auth() {
if [[ "$ENABLE_SCCACHE" == true ]]; then

if [[ -n "$SCCACHE_AUTH_DIR" ]]; then
if [[ ! -d "$SCCACHE_AUTH_DIR" ]]; then
echo "ERROR: sccache auth directory not found: $SCCACHE_AUTH_DIR"
exit 1
fi
if [[ ! -f "$SCCACHE_AUTH_DIR/github_token" ]]; then
echo "ERROR: GitHub token not found: $SCCACHE_AUTH_DIR/github_token"
exit 1
fi
if [[ ! -f "$SCCACHE_AUTH_DIR/aws_credentials" ]]; then
echo "ERROR: AWS credentials not found: $SCCACHE_AUTH_DIR/aws_credentials"
exit 1
fi
echo "sccache authentication files found in: $SCCACHE_AUTH_DIR"
else
echo "ERROR: No sccache auth directory provided but sccache is enabled. Run setup_sccache_auth.sh first."
exit 1
fi
fi
}

# Detect CUDA architecture since native architecture detection doesn't work
# inside Docker containers
Expand All @@ -177,10 +227,11 @@ detect_cuda_architecture() {
fi
}



parse_args "$@"

# Validate sccache authentication if sccache is enabled
validate_sccache_auth

# Validate repo layout using shared script
../../scripts/validate_directories_exist.sh "../../../velox"

Expand All @@ -204,6 +255,17 @@ DOCKER_BUILD_OPTS+=(--build-arg VELOX_ENABLE_BENCHMARKS="${VELOX_ENABLE_BENCHMAR
DOCKER_BUILD_OPTS+=(--build-arg TREAT_WARNINGS_AS_ERRORS="${TREAT_WARNINGS_AS_ERRORS}")
DOCKER_BUILD_OPTS+=(--build-arg BUILD_TYPE="${BUILD_TYPE}")

# Add sccache build arguments
if [[ "$ENABLE_SCCACHE" == true ]]; then
DOCKER_BUILD_OPTS+=(--build-arg ENABLE_SCCACHE="ON")
# Copy auth files to build context

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be cleaned up later?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I put in a trap.

mkdir -p ../docker/sccache/sccache_auth/
cp "$SCCACHE_AUTH_DIR/github_token" ../docker/sccache/sccache_auth/
cp "$SCCACHE_AUTH_DIR/aws_credentials" ../docker/sccache/sccache_auth/
else
DOCKER_BUILD_OPTS+=(--build-arg ENABLE_SCCACHE="OFF")
fi

if [[ "$LOG_ENABLED" == true ]]; then
echo "Logging build output to $LOGFILE"
docker compose -f "$COMPOSE_FILE" build "${DOCKER_BUILD_OPTS[@]}" | tee "$LOGFILE"
Expand All @@ -213,6 +275,7 @@ else
BUILD_EXIT_CODE=$?
fi


if [[ "$BUILD_EXIT_CODE" == "0" ]]; then
# Update EXPECTED_OUTPUT_DIR to use the correct build directory
EXPECTED_OUTPUT_DIR="/opt/velox-build/${BUILD_TYPE}"
Expand All @@ -235,6 +298,13 @@ if [[ "$BUILD_EXIT_CODE" == "0" ]]; then
else
echo " Benchmarks and nsys profiling are disabled in this build."
fi
if [[ "$ENABLE_SCCACHE" == true ]]; then
echo " sccache distributed compilation caching was enabled for this build."
if [[ -n "$SCCACHE_AUTH_DIR" ]]; then
echo " To check sccache stats, run:"
echo " docker compose -f $COMPOSE_FILE run --rm ${CONTAINER_NAME} sccache --show-stats"
fi
fi
echo ""
else
echo " ERROR: Build succeeded but ${EXPECTED_OUTPUT_DIR} not found in the container."
Expand Down
Loading