Skip to content
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
6f45c91
Use container os disk (az vm temp) vs. remote disk for build
amd-justchen Apr 21, 2025
261df51
Add diskpart show disk output to debug
amd-justchen Apr 21, 2025
d4d05f1
Add post build disk usage
amd-justchen Apr 21, 2025
07dddcb
Use powershell over diskpart due to errors
amd-justchen Apr 21, 2025
6b9e540
Remove mkdir for build directory to use emptydir
amd-justchen Apr 21, 2025
cd0e1b8
Put ccache into /tmpbuild/ and zero before build
amd-justchen Apr 21, 2025
65b7eac
Remove clean up build dir since using temp disk
amd-justchen Apr 21, 2025
f06fd68
Restore ccache dir to gha work dir to debug
amd-justchen Apr 21, 2025
46452d9
Use ccache-dir mount
amd-justchen Apr 21, 2025
2e77203
Use GH workspace dir for cache and src
amd-justchen Apr 21, 2025
2497a22
Use B: to build to allow viewing disk space
amd-justchen Apr 21, 2025
4a78f6d
Re-enable mkdir for creating build dir on B:
amd-justchen Apr 21, 2025
bd69247
Add output for logical processor (vcpu) count
amd-justchen Apr 22, 2025
8ad4c33
Try to use all threads with cmake
amd-justchen Apr 22, 2025
9d4d4af
Try to use 96 threads with fetch src
amd-justchen Apr 22, 2025
ad6adbf
Remove deprecated timestamping and cleanup
amd-justchen Apr 22, 2025
ab95169
Merge branch 'main' into users/justchen/win-use-azvm-tmpdisk
amd-justchen Apr 22, 2025
32f501f
Remove blank space for pre-commit validation
amd-justchen Apr 22, 2025
7d48dd8
Trim trailing whitespace for pre-commit validation
amd-justchen Apr 22, 2025
b24a2e5
Remove trailing space on empty line
amd-justchen Apr 22, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 23 additions & 12 deletions .github/workflows/build_windows_packages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,22 +27,25 @@ jobs:
strategy:
fail-fast: true
env:
BASE_BUILD_DIR_POWERSHELL: C:\mnt\azure\b
CACHE_DIR: ${{ github.workspace }}/.cache
CCACHE_DIR: "${{ github.workspace }}/.cache/ccache"
CCACHE_MAXSIZE: "700M"
BASE_BUILD_DIR_POWERSHELL: B:\tmpbuild
CACHE_DIR: "${{github.workspace}}/.cache"
CCACHE_DIR: "${{github.workspace}}/.cache/ccache"
CCACHE_MAXSIZE: "4000M"
steps:
- name: "Create build dir"
shell: powershell
run: |
$currentTime = (Get-Date).ToString('HHmmss')
$buildDir = "$env:BASE_BUILD_DIR_POWERSHELL\$currentTime"
$buildDir = "$env:BASE_BUILD_DIR_POWERSHELL\"
echo "BUILD_DIR_POWERSHELL=$buildDir" >> $env:GITHUB_ENV
mkdir "$buildDir"
Write-Host "Generated Build Directory: $buildDir"
$bashBuildDir = $buildDir -replace '\\', '/' -replace '^C:', '/c'
$bashBuildDir = $buildDir -replace '\\', '/' -replace '^B:', '/b'
echo "BUILD_DIR_BASH=$bashBuildDir" >> $env:GITHUB_ENV
Write-Host "Converted Build Directory For Bash: $bashBuildDir"
$fs = Get-PSDrive -PSProvider "FileSystem"
$fsout = $fs | Select-Object -Property Name,Used,Free,Root
$fsout | % {$_.Used/=1GB;$_.Free/=1GB;$_} | Write-Host
get-disk | Select-object @{Name="Size(GB)";Expression={$_.Size/1GB}} | Write-Host
Comment on lines 35 to +48

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My general preference for this sort of runner-specific code is to put it in the runner configuration itself, possibly in:

That way there can be multiple workflows sharing the same runners without needing to repeat the setup code and the workflows can stay focused on the core steps that a developer would do to reproduce the job on any machine: clone, configure, build.

You could also extract it out to a github action, maybe with a "post" step for the system health logging


- name: "Checking out repository"
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
Expand All @@ -59,12 +62,13 @@ jobs:
run: |
echo "CCACHE_DIR=${CCACHE_DIR}"
df -h
ccache -z
mkdir -p $CCACHE_DIR
cmake --version
echo "python: $(which python), python3: $(which python3)"
echo "Git version: $(git --version)"
git config fetch.parallel 10
nthreads=$(nproc --all)
echo [*] Logical Processors to be used: $nthreads...

# TODO: We shouldn't be using a cache on actual release branches, but it
# really helps for iteration time.
Expand All @@ -78,13 +82,16 @@ jobs:

- name: Fetch sources
run: |
python ./build_tools/fetch_sources.py --jobs 12
python ./build_tools/fetch_sources.py --jobs 96
Comment on lines 83 to +85

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

96 jobs for fetching seems like a lot... can the network actually support that? I would expect it to get pretty severely bottlenecked. We could also teach fetch_sources.py to infer from nproc or use some other heuristic to pick a good default.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I think my intention was to base it on threads and also was experimenting with it. I am not 100% clear on how that argument percolates to the git commands that are run, but from what I understand it passes the jobs argument to the spawned git processes. I think there shouldn't be a network bottleneck (the D96ads_v6 gets 40Gbps of max network bandwidth) assuming Github's servers are adequate.

The theory was, most of the time was spent checking out the many submodules, and applying patches to them, which seemed like a CPU intensive process, so attempt to increase the simultaneous processing of them, but I think it didn't help much as it looked like the whole process is done serially from observing task manager while it was in this step.

All in all though, I think we are trying to prevent having to do a full clone every time and persistently store the source repo files so it can just pull diffs if need be.


- name: Configure MSVC
uses: ilammy/msvc-dev-cmd@0b201ec74fa43914dc39ae48a89fd1d8cb592756 # v1.13.0

- name: Configure Projects
run: |
# clear cache before build and after download
ccache -z

# Generate a new build id.
package_version="${{ inputs.package_version }}"
amdgpu_families="${{ inputs.amdgpu_families }}"
Expand Down Expand Up @@ -114,7 +121,7 @@ jobs:
-DTHEROCK_ENABLE_ML_LIBS=OFF

- name: Build Projects
run: cmake --build "${{ env.BUILD_DIR_BASH }}"
run: cmake --build "${{ env.BUILD_DIR_BASH }}" -j $(nproc --all)
Comment on lines -117 to +124

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ninja should automatically run with the right parallelism here. Did it not?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so, this is a lingering question and was trying that argument out just to test since I was observing the CPU usage metrics from the Azure dashboard for the node only hitting ~60 vcpu usage when i expected 96. It could be that the kubernetes configuration is limiting resource limits but I read it should by default allow pods to use max CPU resources. My next idea was to output CPU metrics monitoring logs from within the VM from either native windows performance counters or hwinfo32 as that may give a more accurate perspective of usage.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually as of now it seems to be using nearly all 96
image


- name: Report
if: ${{ !cancelled() }}
Expand All @@ -133,7 +140,11 @@ jobs:
path: ${{ env.CACHE_DIR }}
key: windows-build-packages-v2-${{ github.sha }}

- name: "Clean up build dir"
- name: "Build size report"
if: always()
run: if (Test-Path -Path "$Env:BUILD_DIR_POWERSHELL") {Remove-Item -Path "$Env:BUILD_DIR_POWERSHELL" -Recurse -Force}
run: |
$fs = Get-PSDrive -PSProvider "FileSystem"
$fsout = $fs | Select-Object -Property Name,Used,Free,Root
$fsout | % {$_.Used/=1GB;$_.Free/=1GB;$_} | Write-Host
get-disk | Select-object @{Name="Size(GB)";Expression={$_.Size/1GB}} | Write-Host
shell: powershell