Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck on => [builder 13/20] RUN ./Engine/Build/BatchFiles/RunUAT.sh BuildGraph -target="Make Installed Build Linux" -script=Engine/Build/InstalledEngineBuild.xml -set:HostPlatformOnly=true #347

Open
dev-fredericfox opened this issue Jan 14, 2024 · 13 comments
Labels

Comments

@dev-fredericfox
Copy link

Output of the ue4-docker info command:

Me@My-MBP ~ % ue4-docker info
/Library/Python/3.9/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
ue4-docker version:         0.0.111 (latest available version is 0.0.111)
Operating system:           macOS 14.2.1 (Kernel Version 23.2.0)
Docker daemon version:      24.0.7
NVIDIA Docker supported:    No
Maximum image size:         No limit detected
Available disk space:       Unknown (typically means the Docker daemon is running in a Moby VM, e.g. Docker Desktop)
Total system memory:        128 GiB physical, 1 GiB virtual
CPU:                        16 physical, 16 logical (arm)

Additional details:

  • Are you accessing the network through a proxy server? No
  • Full Command: ue4-docker build custom:UE532v0.0.1 -repo=https://github.com/dev-fredericfox/UnrealEngine_release.git -branch=main -username=dev-fredericfox -password=ghp_REDACTED --exclude templates --monitor

The RUN ./Engine/Build/BatchFiles/RunUAT.sh BuildGraph -target="Make Installed Build Linux" process just randomly stops at some point. Usually between 200/3994 and 600/3994, without any consistency or specified reason. It just keeps running without ever progressing.

Example 01 Stuck at 234

[+] Building 1070.9s (18/33)                                                                                                                                                                              docker:desktop-linux
 => [internal] load .dockerignore                                                                                                                                                                                         0.0s
 => => transferring context: 53B                                                                                                                                                                                          0.0s
 => [internal] load build definition from Dockerfile                                                                                                                                                                      0.0s
 => => transferring dockerfile: 8.11kB                                                                                                                                                                                    0.0s
 => [internal] load metadata for docker.io/adamrehn/ue4-build-prerequisites:opengl-ubuntu22.04                                                                                                                            0.0s
 => [internal] load metadata for docker.io/adamrehn/ue4-source:wyrdue532v0.0.1-opengl-ubuntu22.04                                                                                                                         0.0s
 => [internal] load build context                                                                                                                                                                                         0.0s
 => => transferring context: 307B                                                                                                                                                                                         0.0s
 => CACHED [stage-1 1/8] FROM docker.io/adamrehn/ue4-build-prerequisites:opengl-ubuntu22.04                                                                                                                               0.0s
 => [builder  1/20] FROM docker.io/adamrehn/ue4-source:wyrdue532v0.0.1-opengl-ubuntu22.04                                                                                                                                 0.0s
 => CACHED [builder  2/20] COPY set-changelist.py /tmp/set-changelist.py                                                                                                                                                  0.0s
 => CACHED [builder  3/20] RUN python3 /tmp/set-changelist.py /home/ue4/UnrealEngine/Engine/Build/Build.version $CHANGELIST && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer to d  0.0s
 => CACHED [builder  4/20] RUN rm -rf /home/ue4/UnrealEngine/.git && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer to disk.' && echo 'Note that for large filesystem layers this   0.0s
 => CACHED [builder  5/20] COPY enable-opengl.py /tmp/enable-opengl.py                                                                                                                                                    0.0s
 => CACHED [builder  6/20] RUN python3 /tmp/enable-opengl.py /home/ue4/UnrealEngine/Engine/Config/BaseEngine.ini && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer to disk.' && ec  0.0s
 => CACHED [builder  7/20] COPY patch-filters-xml.py /tmp/patch-filters-xml.py                                                                                                                                            0.0s
 => CACHED [builder  8/20] RUN python3 /tmp/patch-filters-xml.py /home/ue4/UnrealEngine/Engine/Build/InstalledEngineFilters.xml && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer   0.0s
 => CACHED [builder  9/20] COPY patch-build-graph.py /tmp/patch-build-graph.py                                                                                                                                            0.0s
 => CACHED [builder 10/20] RUN python3 /tmp/patch-build-graph.py /home/ue4/UnrealEngine/Engine/Build/InstalledEngineBuild.xml /home/ue4/UnrealEngine/Engine/Build/Build.version && echo '' && echo 'RUN directive comple  0.0s
 => CACHED [builder 11/20] RUN ./Engine/Build/BatchFiles/Linux/Build.sh ShaderCompileWorker Linux Development -SkipBuild -buildubt && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem lay  0.0s
 => CACHED [builder 12/20] WORKDIR /home/ue4/UnrealEngine                                                                                                                                                                 0.0s
 => [builder 13/20] RUN ./Engine/Build/BatchFiles/RunUAT.sh BuildGraph     -target="Make Installed Build Linux"     -script=Engine/Build/InstalledEngineBuild.xml     -set:HostPlatformOnly=true     -set:WithDDC=tru  1070.9s
 => => # [229/3994] Compile Module.Chaos.3.cpp                                                                                                                                                                                
 => => # [230/3994] Link (lld) libUnrealEditor-TextureBuildUtilities.so                                                                                                                                                       
 => => # [231/3994] Compile Module.Chaos.10.cpp                                                                                                                                                                               
 => => # [232/3994] Compile Module.AppFramework.3.cpp                                                                                                                                                                         
 => => # [233/3994] Compile Module.OpenColorIOWrapper.cpp                                                                                                                                                                     
 => => # [234/3994] Link (lld) libUnrealEditor-OpenColorIOWrapper.so                    

Example 02 Stuck at 476:

[+] Building 1592.6s (18/33)                                                                                                                                                                              docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                                                      0.0s
 => => transferring dockerfile: 8.11kB                                                                                                                                                                                    0.0s
 => [internal] load .dockerignore                                                                                                                                                                                         0.0s
 => => transferring context: 53B                                                                                                                                                                                          0.0s
 => [internal] load metadata for docker.io/adamrehn/ue4-build-prerequisites:opengl-ubuntu22.04                                                                                                                            0.0s
 => [internal] load metadata for docker.io/adamrehn/ue4-source:wyrdue532v0.0.1-opengl-ubuntu22.04                                                                                                                         0.0s
 => [internal] load build context                                                                                                                                                                                         0.0s
 => => transferring context: 307B                                                                                                                                                                                         0.0s
 => CACHED [stage-1 1/8] FROM docker.io/adamrehn/ue4-build-prerequisites:opengl-ubuntu22.04                                                                                                                               0.0s
 => [builder  1/20] FROM docker.io/adamrehn/ue4-source:wyrdue532v0.0.1-opengl-ubuntu22.04                                                                                                                                 0.0s
 => CACHED [builder  2/20] COPY set-changelist.py /tmp/set-changelist.py                                                                                                                                                  0.0s
 => CACHED [builder  3/20] RUN python3 /tmp/set-changelist.py /home/ue4/UnrealEngine/Engine/Build/Build.version $CHANGELIST && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer to d  0.0s
 => CACHED [builder  4/20] RUN rm -rf /home/ue4/UnrealEngine/.git && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer to disk.' && echo 'Note that for large filesystem layers this   0.0s
 => CACHED [builder  5/20] COPY enable-opengl.py /tmp/enable-opengl.py                                                                                                                                                    0.0s
 => CACHED [builder  6/20] RUN python3 /tmp/enable-opengl.py /home/ue4/UnrealEngine/Engine/Config/BaseEngine.ini && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer to disk.' && ec  0.0s
 => CACHED [builder  7/20] COPY patch-filters-xml.py /tmp/patch-filters-xml.py                                                                                                                                            0.0s
 => CACHED [builder  8/20] RUN python3 /tmp/patch-filters-xml.py /home/ue4/UnrealEngine/Engine/Build/InstalledEngineFilters.xml && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer   0.0s
 => CACHED [builder  9/20] COPY patch-build-graph.py /tmp/patch-build-graph.py                                                                                                                                            0.0s
 => CACHED [builder 10/20] RUN python3 /tmp/patch-build-graph.py /home/ue4/UnrealEngine/Engine/Build/InstalledEngineBuild.xml /home/ue4/UnrealEngine/Engine/Build/Build.version && echo '' && echo 'RUN directive comple  0.0s
 => CACHED [builder 11/20] RUN ./Engine/Build/BatchFiles/Linux/Build.sh ShaderCompileWorker Linux Development -SkipBuild -buildubt && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem lay  0.0s
 => CACHED [builder 12/20] WORKDIR /home/ue4/UnrealEngine                                                                                                                                                                 0.0s
 => [builder 13/20] RUN ./Engine/Build/BatchFiles/RunUAT.sh BuildGraph     -target="Make Installed Build Linux"     -script=Engine/Build/InstalledEngineBuild.xml     -set:HostPlatformOnly=true     -set:WithDDC=tru  1592.6s
 => => # [471/3994] Compile Module.Engine.18.cpp                                                                                                                                                                              
 => => # [472/3994] Compile Module.Engine.59.cpp                                                                                                                                                                              
 => => # [473/3994] Compile Module.Engine.12.cpp                                                                                                                                                                              
 => => # [474/3994] Compile Module.Engine.15.cpp                                                                                                                                                                              
 => => # [475/3994] Compile Module.Engine.20.cpp                                                                                                                                                                              
 => => # [476/3994] Compile Module.Engine.65.cpp   
@slonopotamus
Copy link
Collaborator

How much RAM/CPUs is allocated to Docker VM?

@dev-fredericfox
Copy link
Author

dev-fredericfox commented Jan 14, 2024

How much RAM/CPUs is allocated to Docker VM?

50gb/16 CPUs/880gb disk

@dev-fredericfox
Copy link
Author

One of the issues seems to be related to Multithreading, although this is really not my area of expertise.
When I reduce my Docker CPUs to 1 I don't get stuck in the compiling phase. (This however take several days).
However now the output is clipped, so I am not sure how to proceed to keep debugging.

Output when running only one 1 CPU:

 => [builder 13/20] RUN ./Engine/Build/BatchFiles/RunUAT.sh BuildGraph     -target="Make Installed Build Linux"     -script=Engine/Build/InstalledEngin  235804.3s
 => => # LogShaderCompilers: Display:                         TBasePassPSFNoLightMapPolicySkylight - 5.17% of total time (compiled   38 times, average 77.80 sec, 
 => => # max 236.41 sec, min 44.25 sec)                                                                                                                           
 => => # LogShaderCompilers: Display: TBasePassPSFPrecomputedVolumetricLightmapLightingPolicySkylight - 4.47% of total time (compiled   32 times, average 79.90 se
 => => # c, max 239.03 sec, min 51.85 sec)                                                                                                                        
 => => # Log                                                                                                                                                      
 => => # [output clipped, log limit 2MiB reached]                                                                                                                 

@slonopotamus
Copy link
Collaborator

Given that you have plenty of RAM, it might possibly be easier to spin up a Linux VM and run ue4-docker inside it.

@TBBle
Copy link
Collaborator

TBBle commented Jan 20, 2024

When running only one CPU (which is also the default when doing the build in Hyper-V isolation on Windows) you may be bitten by an issue in the UE build system where the build management system keeps a whole CPU core busy checking for progress from the shader compiler processes, and hence the shader compiler processes themselves actually get little-to-no CPU time and don't progress, leading to an unexpected 10's-of-hours build.

I only found this in UE4, I sent them a bug report, but I don't recall them accepting my fix (a sleep in the loop checking for Shader Compiler progress) or otherwise addressing it; a single-core development environment is not supported after all. In a multi-core environment, I couldn't demonstrate build-time improvement from my fix either, which surprised me.

But that sounds like what you hit here in your single-CPU attempt. So maybe try with two cores, see if that avoids the compile hang and also the shader compiler issue.

@dev-fredericfox
Copy link
Author

I see. I tested with 2 CPU cores and sadly it gets stuck fairly early in the process.

Sometimes when I cancel the process after being stuck I notice this error message (not always). Could be related?

Screenshot 2024-01-17 at 11 52 29

@TBBle
Copy link
Collaborator

TBBle commented Jan 21, 2024

That error is Python seeing a Control-C in a thread, presumably because you're hitting Control-C to cancel the process, I don't believe it's related.

@dev-fredericfox
Copy link
Author

dev-fredericfox commented Jan 23, 2024

@slonopotamus

Given that you have plenty of RAM, it might possibly be easier to spin up a Linux VM and run ue4-docker inside it.

Trying that right now, but first tests show that even in a UTM VM (Debian 12 Rosetta Virtualization) it still gets stuck. I will try an emulation later, but the performance is going to be horrendous.

Screenshot 2024-01-24 at 00 39 12

@dev-fredericfox
Copy link
Author

Emulation seems to be broken as well. Or at least it's unreasonable to use. Been "stuck" without progress on step [builder 11/20] for the past 8 hours, fans on full blast.
Are people running this package primarily on windows or why does it seem to only affect me?

@slonopotamus
Copy link
Collaborator

We either run natively on Windows or on Linux.

@dev-fredericfox
Copy link
Author

We either run natively on Windows or on Linux.

But only amd64 or does it work on linux arm?

@dev-fredericfox
Copy link
Author

I think I found a "sort of" workaround for now.

When step 13 fails, I run docker -it and run the the steps of 13 manually from inside the container. When the compiling freezes I kill the tasks, and since make is incremental, I just relaunch it. Looks to be working for now.

Only question is: How do I commit this stage manually to the layer to proceed to step 14? I could do a docker commit but AFAIK this creates a new container, how will the ue4-docker script know to look for the container with the manually committed changes?
Any input appreciated!

@TBBle
Copy link
Collaborator

TBBle commented Jan 31, 2024

The only way you could use docker commit and then continue the image build from there would be to change the Dockerfile to have a FROM for that created container at that point. It seems like a lot of hassle.

Can you use docker exec to inspect the hung container build stage with top or similar? (I honestly don't remember if you can do that...) I kind-of suspect this is an Unreal-level bug, some kind of shared resource or busy-wait that's deadlocking. If your CPU load is causing your fans to run, then it thinks its doing something and as I mentioned earlier, I know of at least one busy-wait that used to exist in the system, and may still do.

(Actually, you can use top from outside the container to inspect the processes inside it, but I believe the defaults hid processes in different PID namespaces...)

Oh, right, you can reproduce this in a docker run, so you can definitely docker exec in and use top to inspect that state.

My guess is that you've got all your cores busy-waiting, and no actual build processes are advancing. The fact that two-cores get stuck early suggests that this is the case, that the build is accumulating more busy-waiters over time, until they luck-out and fill all the available cores simultaneously. If that turns out to be the case, it may be possible to renice the busy-waiters from outside the container, in order to get the build to resume progress. That'll be a little fiddly, but less-so than trying to inject the manually-built container into the build workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants