Skip to content
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 43 additions & 2 deletions .github/workflows/velox-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,12 +88,51 @@ jobs:
working-directory: ${{ github.workspace }}/velox-testing/velox/scripts
run: ./build_centos_deps_image.sh

- &setup-sccache
name: Setup sccache authentication
env:
GH_TOKEN: ${{ github.token }}
SCCACHE_AWS_ACCESS_KEY_ID: ${{ secrets.SCCACHE_AWS_ACCESS_KEY_ID }}
SCCACHE_AWS_SECRET_ACCESS_KEY: ${{ secrets.SCCACHE_AWS_SECRET_ACCESS_KEY }}
SCCACHE_AWS_SESSION_TOKEN: ${{ secrets.SCCACHE_AWS_SESSION_TOKEN }}
working-directory: ${{ github.workspace }}/velox-testing/velox/scripts
run: |
# Check if AWS credentials are available
if [[ -z "$SCCACHE_AWS_ACCESS_KEY_ID" ]] || [[ -z "$SCCACHE_AWS_SECRET_ACCESS_KEY" ]]; then
echo "Warning: SCCACHE_AWS_ACCESS_KEY_ID or SCCACHE_AWS_SECRET_ACCESS_KEY not found in secrets"
echo "Skipping sccache setup"
exit 0
fi

# Setup sccache auth directory
mkdir -p $HOME/.sccache-auth

# Save GitHub token
echo "$GH_TOKEN" > $HOME/.sccache-auth/github_token
chmod 600 $HOME/.sccache-auth/github_token

# Create AWS credentials file
cat > $HOME/.sccache-auth/aws_credentials << EOF
[default]
aws_access_key_id = $SCCACHE_AWS_ACCESS_KEY_ID
aws_secret_access_key = $SCCACHE_AWS_SECRET_ACCESS_KEY

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses the default AWS credentials available through GHA secrets instead of generating them every time. The same applies for the GitHub token.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to make these changes in this PR, enabling sccache for CI is a separate work item which will have a follow-up PR.

EOF

# Add session token if available (for temporary credentials)
if [[ -n "$SCCACHE_AWS_SESSION_TOKEN" ]]; then
echo "aws_session_token = $SCCACHE_AWS_SESSION_TOKEN" >> $HOME/.sccache-auth/aws_credentials
fi

chmod 600 $HOME/.sccache-auth/aws_credentials
echo "SCCACHE_AUTH_DIR=$HOME/.sccache-auth" >> $GITHUB_ENV
echo "Sccache authentication setup complete"

- name: Building Velox for CPU
run: |
echo " - CUDF support: DISABLED"
echo " - Target: CPU-only build"
cd ${{ github.workspace }}/velox-testing/velox/scripts
TREAT_WARNINGS_AS_ERRORS=0 ./build_velox.sh --all-cuda-archs --cpu --log build_velox_cpu.log
TREAT_WARNINGS_AS_ERRORS=0 ./build_velox.sh --all-cuda-archs --cpu --sccache --log build_velox_cpu.log

- name: Upload velox_cpu_build_log as an artifact
uses: actions/upload-artifact@v4
Expand Down Expand Up @@ -123,12 +162,14 @@ jobs:

- *build-deps

- *setup-sccache

- name: Building Velox for GPU
working-directory: ${{ github.workspace }}/velox-testing/velox/scripts
run: |
echo " - CUDF support: ENABLED"
echo " - Target: GPU-accelerated build"
TREAT_WARNINGS_AS_ERRORS=0 ./build_velox.sh --all-cuda-archs --gpu --log build_velox_gpu.log
TREAT_WARNINGS_AS_ERRORS=0 ./build_velox.sh --all-cuda-archs --gpu --sccache --log build_velox_gpu.log

- name: Upload velox_gpu_build_log as an artifact
uses: actions/upload-artifact@v4
Expand Down
23 changes: 23 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,29 @@ A Docker-based build infrastructure has been added to facilitate building Velox

Specifically, the `velox-testing` and `velox` repositories must be checked out as sibling directories under the same parent directory. Once that is done, navigate (`cd`) into the `velox-testing/velox/scripts` directory and execute the build script `build_velox.sh`. After a successful build, the Velox libraries and executables are available in the container at `/opt/velox-build/release`.

## `sccache` Usage
`sccache` has been integrated to significantly accelerate builds using remote S3 caching and optional distributed compilation. Currently supported for Velox builds only (not Presto).

The fork `rapidsai/sccache` is integrated and configured for use with the `NVIDIA` GitHub organization.

### Setup and Usage
First, set up authentication credentials:
```bash
cd velox-testing/velox/scripts
./setup_sccache_auth.sh

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this script results in the following error:

ERROR: failed to build: failed to solve: failed to read dockerfile: open sccache_auth.dockerfile: no such file or directory

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh oops I think I forgot to include a dockerfile in this PR, my bad.

```

Then build Velox with sccache enabled:
```bash
# Default: Remote S3 cache + local compilation (recommended)
./build_velox.sh --sccache

# Optional: Enable distributed compilation (may cause build differences such as additional warnings)
./build_velox.sh --sccache --sccache-enable-dist
```

Authentication files are stored in `~/.sccache-auth/` by default and credentials are valid for 12 hours. By default, distributed compilation is disabled to avoid compiler version differences that can cause build failures.

## Velox Benchmarking
A Docker-based benchmarking infrastructure has been added to facilitate running Velox benchmarks with support for CPU/GPU execution engines and profiling capabilities. The infrastructure uses a dedicated `velox-benchmark` Docker service with pre-configured volume mounts that automatically sync benchmark data and results. The data follows Hive directory structure, making it compatible with Presto. Currently, only TPC-H is implemented, but the infrastructure is designed to be easily extended to support additional benchmarks in the future.

Expand Down
56 changes: 48 additions & 8 deletions velox/docker/adapters_build.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ ARG TREAT_WARNINGS_AS_ERRORS=1
ARG VELOX_ENABLE_BENCHMARKS=ON
ARG BUILD_BASE_DIR=/opt/velox-build
ARG BUILD_TYPE=release
ARG ENABLE_SCCACHE=OFF
ARG SCCACHE_DISABLE_DIST=ON

# Environment mirroring upstream CI defaults and incorporating build args
ENV VELOX_DEPENDENCY_SOURCE=SYSTEM \
Expand Down Expand Up @@ -40,18 +42,33 @@ ENV VELOX_DEPENDENCY_SOURCE=SYSTEM \
-DVELOX_ENABLE_CUDF=${BUILD_WITH_VELOX_ENABLE_CUDF} \
-DVELOX_ENABLE_FAISS=ON" \
LD_LIBRARY_PATH="${BUILD_BASE_DIR}/${BUILD_TYPE}/lib:\
${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/cudf-build:\
${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/rmm-build:\
${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/rapids_logger-build:\
${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/kvikio-build:\
${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/nvcomp_proprietary_binary-src/lib64" \
${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/cudf-build:\
${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/rmm-build:\
${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/rapids_logger-build:\
${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/kvikio-build:\
${BUILD_BASE_DIR}/${BUILD_TYPE}/_deps/nvcomp_proprietary_binary-src/lib64" \
ENABLE_SCCACHE=${ENABLE_SCCACHE} \
SCCACHE_DISABLE_DIST=${SCCACHE_DISABLE_DIST} \
CCACHE_DIR=/ccache

WORKDIR /workspace/velox

# Print environment variables for debugging
RUN printenv | sort

# Install sccache if enabled
RUN if [ "$ENABLE_SCCACHE" = "ON" ]; then \
set -euxo pipefail && \
# Install RAPIDS sccache fork
wget --no-hsts -q -O- "https://github.com/rapidsai/sccache/releases/download/v0.10.0-rapids.68/sccache-v0.10.0-rapids.68-$(uname -m)-unknown-linux-musl.tar.gz" | \
tar -C /usr/bin -zf - --wildcards --strip-components=1 -x '*/sccache' 2>/dev/null && \
chmod +x /usr/bin/sccache && \
# Verify installation
sccache --version; \
else \
echo "Skipping sccache installation (ENABLE_SCCACHE=OFF)"; \
fi

# Install NVIDIA Nsight Systems (nsys) for profiling - only if benchmarks are enabled
RUN if [ "$VELOX_ENABLE_BENCHMARKS" = "ON" ]; then \
set -euxo pipefail && \
Expand All @@ -68,9 +85,32 @@ RUN if [ "$VELOX_ENABLE_BENCHMARKS" = "ON" ]; then \
echo "Skipping nsys installation (VELOX_ENABLE_BENCHMARKS=OFF)"; \
fi

# Build using the specified build type and directory
# Copy sccache setup script (if sccache enabled)
COPY velox-testing/velox/docker/sccache/sccache_setup.sh /sccache_setup.sh
RUN if [ "$ENABLE_SCCACHE" = "ON" ]; then chmod +x /sccache_setup.sh; fi

# Copy sccache auth files (note source of copy must be within the docker build context)
COPY velox-testing/velox/docker/sccache/sccache_auth/ /sccache_auth/

# Build in Release mode into ${BUILD_BASE_DIR}
RUN --mount=type=bind,source=velox,target=/workspace/velox,ro \
--mount=type=cache,target=/ccache \
set -euxo pipefail && \
make cmake BUILD_DIR="${BUILD_TYPE}" BUILD_TYPE="${BUILD_TYPE}" EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS[*]}" BUILD_BASE_DIR="${BUILD_BASE_DIR}" && \
make build BUILD_DIR="${BUILD_TYPE}" BUILD_BASE_DIR="${BUILD_BASE_DIR}"
# Configure sccache if enabled
if [ "$ENABLE_SCCACHE" = "ON" ]; then \
# Run sccache setup script
/sccache_setup.sh && \
# Add sccache CMake flags
EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS} -DCMAKE_C_COMPILER_LAUNCHER=sccache -DCMAKE_CXX_COMPILER_LAUNCHER=sccache -DCMAKE_CUDA_COMPILER_LAUNCHER=sccache" && \
echo "sccache distributed status:" && \
sccache --dist-status && \
echo "Pre-build sccache (zeroed out) statistics:" && \
sccache --show-stats; \
fi && \
make cmake BUILD_DIR="${BUILD_TYPE}" BUILD_TYPE="${BUILD_TYPE}" EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS}" BUILD_BASE_DIR="${BUILD_BASE_DIR}" && \
make build BUILD_DIR="${BUILD_TYPE}" BUILD_BASE_DIR="${BUILD_BASE_DIR}" && \
# Show final sccache stats if enabled
if [ "$ENABLE_SCCACHE" = "ON" ]; then \
echo "Post-build sccache statistics:" && \
sccache --show-stats; \
fi
85 changes: 85 additions & 0 deletions velox/docker/sccache/sccache_setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#!/bin/bash
set -euo pipefail

# Check for required auth files
if [[ ! -f /sccache_auth/github_token ]]; then
echo "ERROR: GitHub token not found at /sccache_auth/github_token"
exit 1
fi

if [[ ! -f /sccache_auth/aws_credentials ]]; then
echo "ERROR: AWS credentials not found at /sccache_auth/aws_credentials"
exit 1
fi

# Set up directories
mkdir -p ~/.config/sccache ~/.aws

# Install AWS credentials (safe in Docker container environment)
cp /sccache_auth/aws_credentials ~/.aws/credentials

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this override existing credential setup?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also only affects the ~/.aws/credentials in the docker container that is used to build velox, it does not affect the host.


# Read GitHub token
GITHUB_TOKEN=$(cat /sccache_auth/github_token | tr -d '\n\r ')

# Create sccache config
SCCACHE_ARCH=$(uname -m | sed 's/x86_64/amd64/')

# Check if we should disable distributed compilation (disabled by default)
if [[ "${SCCACHE_DISABLE_DIST:-ON}" == "ON" ]]; then
cat > ~/.config/sccache/config << SCCACHE_EOF
[cache.disk]
size = 107374182400

[cache.disk.preprocessor_cache_mode]
use_preprocessor_cache_mode = true

[cache.s3]
bucket = "rapids-sccache-devs"
region = "us-east-2"
no_credentials = false

# No [dist] section -> disables distributed compilation
SCCACHE_EOF
else
cat > ~/.config/sccache/config << SCCACHE_EOF
[cache.disk]
size = 107374182400

[cache.disk.preprocessor_cache_mode]
use_preprocessor_cache_mode = true

[cache.s3]
bucket = "rapids-sccache-devs"
region = "us-east-2"
no_credentials = false

[dist]
scheduler_url = "https://${SCCACHE_ARCH}.linux.sccache.rapids.nvidia.com"

[dist.auth]
type = "token"
token = "${GITHUB_TOKEN}"
SCCACHE_EOF
fi

# Configure sccache for high parallelism
# Increase file descriptor limit for high parallelism (if possible)
ulimit -n $(ulimit -Hn) || echo "Could not increase file descriptor limit"

# Start sccache server
sccache --start-server

# Test sccache
sccache --show-stats

# Testing distributed compilation status (only if enabled)
if [[ "${SCCACHE_DISABLE_DIST:-ON}" == "ON" ]]; then
echo "Distributed compilation is DISABLED by default - using local compilation with remote S3 caching"
else
if sccache --dist-status; then
echo "Distributed compilation is available"
else
echo "Error: Distributed compilation not available, check connectivity"
exit 1
fi
fi
51 changes: 51 additions & 0 deletions velox/docker/sccache_auth.dockerfile

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this dockerfile based on an existing dockerfile or documentation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dockerfile is largely based on documentation in a slack channel which I can link offline.

Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
FROM ubuntu:22.04

# Prevent interactive prompts during package installation
ENV DEBIAN_FRONTEND=noninteractive

# Install basic dependencies
RUN <<EOF
apt-get update && apt-get install -y \
curl \
wget \
ca-certificates \
gnupg \
lsb-release \
&& rm -rf /var/lib/apt/lists/*
EOF

# Install GitHub CLI
RUN <<EOF
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg
chmod go+r /usr/share/keyrings/githubcli-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | tee /etc/apt/sources.list.d/github-cli.list > /dev/null
apt-get update
apt-get install gh -y
rm -rf /var/lib/apt/lists/*
EOF

# Install gh-nv-gha-aws plugin manually
RUN <<EOF
NV_GHA_AWS_VERSION="0.1.1"
ARCH=$(dpkg --print-architecture)
if [ "$ARCH" = "amd64" ]; then ARCH="amd64"; elif [ "$ARCH" = "arm64" ]; then ARCH="arm64"; fi
mkdir -p /root/.local/share/gh/extensions/gh-nv-gha-aws
wget --no-hsts -q -O /root/.local/share/gh/extensions/gh-nv-gha-aws/gh-nv-gha-aws \
"https://github.com/nv-gha-runners/gh-nv-gha-aws/releases/download/v${NV_GHA_AWS_VERSION}/gh-nv-gha-aws_v${NV_GHA_AWS_VERSION}_linux-${ARCH}"
chmod 0755 /root/.local/share/gh/extensions/gh-nv-gha-aws/gh-nv-gha-aws
EOF

# Create plugin manifest
RUN <<EOF
cat > /root/.local/share/gh/extensions/gh-nv-gha-aws/manifest.yml << 'MANIFEST'
owner: nv-gha-runners
name: gh-nv-gha-aws
host: github.com
tag: v0.1.1
ispinned: false
path: $HOME/.local/share/gh/extensions/gh-nv-gha-aws/gh-nv-gha-aws
MANIFEST
EOF

# Create output directory for credentials
RUN mkdir -p /output
Loading
Loading