Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to untangle the rid calculation. #82832

Merged
merged 38 commits into from
Apr 21, 2023
Merged

Try to untangle the rid calculation. #82832

merged 38 commits into from
Apr 21, 2023

Conversation

tmds
Copy link
Member

@tmds tmds commented Mar 1, 2023

This is an attempt at simplifying how the rids get calculated.

Instead of providing a single rid, init-distro-rid.sh also provides a __PortableOS value which is the portable os that matches the target platform. It gets initialized also for non-portable builds, which enables detecting musl in the native script and passing that information along. The existing __DistroRid is used solely for naming the non-portable build output.

Instead of calculating the _portableOS of the old __DistroRid, Directory.Build.props can now directly use __PortableOS (and fall back to using TargetOS).

_runtimeOS is merged into _packageOS.

Let's see how well this works against the CI configurations. 🤞

cc @ViktorHofer @am11 @jkotas

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Mar 1, 2023
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@tmds
Copy link
Member Author

tmds commented Mar 2, 2023

CI is looking good for these changes, with the exception of the coreclr linux_musl_arm64 job.

This is a cross build job, and not I'm familiar with those.

From the error message, I assume the wrong linker gets used for the crossgen2 project?
Or maybe, the build is supposed to use the crossgen2_crossarch project instead of the crossgen2 project?

@wfurt @am11 any thoughts on what changes in this PR are causing this failure?

  Generating native code
  /usr/bin/ld: unrecognised emulation mode: aarch64linux
  Supported emulations: elf_x86_64 elf32_x86_64 elf_i386 elf_iamcu elf_l1om elf_k1om i386pep i386pe
clang : error : linker command failed with exit code 1 (use -v to see invocation) [/__w/1/s/src/coreclr/tools/aot/crossgen2/crossgen2.csproj]
##[error]clang(0,0): error : (NETCORE_ENGINEERING_TELEMETRY=Build) linker command failed with exit code 1 (use -v to see invocation)
/__w/1/s/artifacts/transport/coreclr/build/Microsoft.NETCore.Native.targets(334,5): error MSB3073: The command ""/usr/bin/clang-15" "/__w/1/s/artifacts/obj/coreclr/crossgen2/arm64/Debug/native/crossgen2.o" -o "/__w/1/s/artifacts/bin/coreclr/linux.arm64.Debug/crossgen2/native/crossgen2" /__w/1/s/artifacts/transport/coreclr/aotsdk/libbootstrapper.a /__w/1/s/artifacts/transport/coreclr/aotsdk/libRuntime.ServerGC.a /__w/1/s/artifacts/transport/coreclr/aotsdk/libeventpipe-disabled.a /__w/1/s/artifacts/transport/coreclr/aotsdk/libstdc++compat.a /__w/1/s/artifacts/transport/coreclr/aotsdk/libnumasupportdynamic.a /__w/1/s/artifacts/bin/microsoft.netcore.app.runtime.linux-musl-arm64/Release/runtimes/linux-musl-arm64/native/libSystem.Native.a /__w/1/s/artifacts/bin/microsoft.netcore.app.runtime.linux-musl-arm64/Release/runtimes/linux-musl-arm64/native/libSystem.Globalization.Native.a /__w/1/s/artifacts/bin/microsoft.netcore.app.runtime.linux-musl-arm64/Release/runtimes/linux-musl-arm64/native/libSystem.IO.Compression.Native.a /__w/1/s/artifacts/bin/microsoft.netcore.app.runtime.linux-musl-arm64/Release/runtimes/linux-musl-arm64/native/libSystem.Net.Security.Native.a /__w/1/s/artifacts/bin/microsoft.netcore.app.runtime.linux-musl-arm64/Release/runtimes/linux-musl-arm64/native/libSystem.Security.Cryptography.Native.OpenSsl.a --sysroot=/crossrootfs/arm64 --target=aarch64-linux-gnu -g -Wl,-rpath,'$ORIGIN' -Wl,--build-id=sha1 -Wl,--as-needed -pthread -ldl -lz -lrt -lm -pie -Wl,-pie -Wl,-z,relro -Wl,-z,now -Wl,--discard-all -Wl,--gc-sections" exited with code 1. [/__w/1/s/src/coreclr/tools/aot/crossgen2/crossgen2.csproj]
##[error]artifacts/transport/coreclr/build/Microsoft.NETCore.Native.targets(334,5): error MSB3073: (NETCORE_ENGINEERING_TELEMETRY=Build) The command ""/usr/bin/clang-15" "/__w/1/s/artifacts/obj/coreclr/crossgen2/arm64/Debug/native/crossgen2.o" -o "/__w/1/s/artifacts/bin/coreclr/linux.arm64.Debug/crossgen2/native/crossgen2" /__w/1/s/artifacts/transport/coreclr/aotsdk/libbootstrapper.a /__w/1/s/artifacts/transport/coreclr/aotsdk/libRuntime.ServerGC.a /__w/1/s/artifacts/transport/coreclr/aotsdk/libeventpipe-disabled.a /__w/1/s/artifacts/transport/coreclr/aotsdk/libstdc++compat.a /__w/1/s/artifacts/transport/coreclr/aotsdk/libnumasupportdynamic.a /__w/1/s/artifacts/bin/microsoft.netcore.app.runtime.linux-musl-arm64/Release/runtimes/linux-musl-arm64/native/libSystem.Native.a /__w/1/s/artifacts/bin/microsoft.netcore.app.runtime.linux-musl-arm64/Release/runtimes/linux-musl-arm64/native/libSystem.Globalization.Native.a /__w/1/s/artifacts/bin/microsoft.netcore.app.runtime.linux-musl-arm64/Release/runtimes/linux-musl-arm64/native/libSystem.IO.Compression.Native.a /__w/1/s/artifacts/bin/microsoft.netcore.app.runtime.linux-musl-arm64/Release/runtimes/linux-musl-arm64/native/libSystem.Net.Security.Native.a /__w/1/s/artifacts/bin/microsoft.netcore.app.runtime.linux-musl-arm64/Release/runtimes/linux-musl-arm64/native/libSystem.Security.Cryptography.Native.OpenSsl.a --sysroot=/crossrootfs/arm64 --target=aarch64-linux-gnu -g -Wl,-rpath,'$ORIGIN' -Wl,--build-id=sha1 -Wl,--as-needed -pthread -ldl -lz -lrt -lm -pie -Wl,-pie -Wl,-z,relro -Wl,-z,now -Wl,--discard-all -Wl,--gc-sections" exited with code 1.

Build FAILED.

src/tests/build.proj Outdated Show resolved Hide resolved
@am11
Copy link
Member

am11 commented Mar 2, 2023

Just a heads up: we spent days of testing in previous consolidation of RID calculation from various places in the repo. There are many scenarios not obvious and not covered by the CI. Then some folks only run their build flavors near the releases. Some edge-cases from memory:

  • Visual Studio build failures (managed libraries development heavily rely on it)
  • Official packages (that shows up when package validation runs / not accessible via public CI)
  • Tizen; uses hybrid approach: portable build for coreclr and non-portable for libs (no source-build infra involved)
  • FreeBSD / illumos: $other Unices when cross-built from linux && when built on the actual VM. Now we have good CI coverage for FreeBSD cross-built (which also implies illumos cross-build is happy). I will try to run build on their host too (once this gets closer to completion).
  • Alpine Linux: cross and native build. There are old and new CI legs some of which only run during the official build.

Then we had issue with what goes into the official nuget packages and we went back and forth on fixing those.

@tmds
Copy link
Member Author

tmds commented Apr 21, 2023

CI is looking good.

@@ -253,11 +250,13 @@
<OutputRID Condition="'$(OutputRID)' == ''">$(_outputOS)-$(TargetArchitecture)</OutputRID>
</PropertyGroup>

<PropertyGroup Label="CalculateTargetOSName" Condition="'$(SkipInferTargetOSName)' != 'true'">
<PropertyGroup Label="CalculateTargetOSName">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Would be good to move this up in a follow-up and add TargetsMobile into the group as well.

@@ -73,7 +73,7 @@
<TestContextVariable Include="NUGET_PACKAGES=$(TestRestorePackagesPath)" />
<TestContextVariable Include="TEST_ARTIFACTS=$(SystemPathTestsOutputDir)" />
<TestContextVariable Include="TEST_TARGETRID=$(TestTargetRid)" />
<TestContextVariable Include="BUILDRID=$(OutputRid)" />
<TestContextVariable Include="BUILDRID=$(OutputRID)" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should the test context variable be changed in the future to OutputRID?

<InnerBuildArgs>$(InnerBuildArgs) /p:RuntimeOS=$(RuntimeOS)</InnerBuildArgs>
<!-- PackageOS and ToolsOS control the rids of prebuilts consumed by the build.
They are set to RuntimeOS so they match with the build SDK rid. -->
<InnerBuildArgs Condition="'$(RuntimeOS)' != ''">$(InnerBuildArgs) /p:PackageOS=$(RuntimeOS) /p:ToolsOS=$(RuntimeOS)</InnerBuildArgs>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who provides RuntimeOS in source build?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that still the right property to set or do we want to change that?

Copy link
Member

@ViktorHofer ViktorHofer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with the caveat that I didn't understand all the changes in init-distro-rid.sh but if our builds are happy with the changes then so am I. Great work. Thanks a lot 💯

@ViktorHofer
Copy link
Member

Test failures are unrelated. I will merge and if something starts to fail, we can revert the change anytime.

@ViktorHofer ViktorHofer merged commit 21fb96b into dotnet:main Apr 21, 2023
@tmds
Copy link
Member Author

tmds commented Apr 21, 2023

@ViktorHofer and @akoeplinger thanks for your support!

@ViktorHofer
Copy link
Member

@tmat the SourceIndexStage leg in dotnet/runtime official builds are currently broken with the following error:

build.cmd -subset libs.sfx+libs.oob -binarylog -os linux -ci

D:\a_work\1\s.dotnet\sdk\8.0.100-preview.3.23178.7\Sdks\NuGet.Build.Tasks.Pack\build\NuGet.Build.Tasks.Pack.targets(221,5): error NU5026: The file 'D:\a_work\1\s\artifacts\bin\native\net8.0-linux-Debug-x64\libSystem.IO.Ports.Native.so.dbg' to be packed was not found on disk. [D:\a_work\1\s\src\libraries\System.IO.Ports\pkg\runtime.linux-x64.runtime.native.System.IO.Ports.proj]
##[error].dotnet\sdk\8.0.100-preview.3.23178.7\Sdks\NuGet.Build.Tasks.Pack\build\NuGet.Build.Tasks.Pack.targets(221,5): error NU5026: (NETCORE_ENGINEERING_TELEMETRY=Build) The file 'D:\a_work\1\s\artifacts\bin\native\net8.0-linux-Debug-x64\libSystem.IO.Ports.Native.so.dbg' to be packed was not found on disk.

I will allow us 24 hours to investigate the issue, otherwise we will need to revert the change, unfortunately.

@ViktorHofer
Copy link
Member

I submitted a fix for it to unblock official builds: #85215

The behavior change was unexpected. Previously, we didn't build the runtime.$(OutputRID).*.proj projects (only System.IO.Ports) in that leg as the glob pattern didn't match anything as OutputRID was set to something different:

<!-- During an official Build, build the rid specific package matching the OutputRID only outside of an allconfigurations build and only when targeting the CoreCLR runtime.
The limitation on the CoreCLR runtime is entirely artificial but avoids duplicate assets being publish. -->
<ProjectReference Include="$(MSBuildThisFileDirectory)*\pkg\runtime.$(OutputRID).*.proj" Condition="'$(BuildingAnOfficialBuildLeg)' != 'true' or
('$(BuildAllConfigurations)' != 'true' and '$(RuntimeFlavor)' == '$(PrimaryRuntimeFlavor)')" />

Now that OutputRID points to the TargetOS, the glob matches and requires the libs.native subset to be built first. Do we want to change anything is or is that the expected new behavior?

@tmds
Copy link
Member Author

tmds commented Apr 23, 2023

build.cmd -subset libs.sfx+libs.oob -binarylog -os linux -ci

I assume this issue is specific to this type of 'cross-build': on Windows we build for Linux.

Once we figure out what the rids were, we should be able to match the previous behavior.

"$(MSBuildThisFileDirectory)\pkg\runtime.$(OutputRID)..proj"

What are these projects?

I submitted a fix for it to unblock official builds: #85215

If you can unblock the official builds, we can fix this in the coming days.

@tmds
Copy link
Member Author

tmds commented Apr 24, 2023

Previously, when we were building on Windows and set TargetOS to Linux, it would use a _packageOS of win for anything that is not TargetsMobile.

<_runtimeOS Condition="'$(_runtimeOS)' == ''">$(_parseDistroRid.SubString(0, $(_distroRidIndex)))</_runtimeOS>
...
<_runtimeOS Condition="'$(TargetsMobile)' == 'true'">$(TargetOS.ToLowerInvariant())</_runtimeOS>
...
<_portableOS Condition="'$(_runtimeOS)' == 'win' or '$(TargetOS)' == 'windows'">win</_portableOS>

To go back to the previous behavior, we can set /p:PackageOS=win in the job, or add something like:

<_packageOS Condition="'$(_hostOS)' == 'windows' and '$(TargetsMobile)' != 'true'">win</_packageOS>

I assume the goal of this configuration is being able to build managed sources on Windows for non-Windows targets.

@ViktorHofer can you verify on Windows if the build works as before by adding /p:PackageOS=win? Should I make a PR for the _packageOS assignment?

@ViktorHofer
Copy link
Member

I assume the goal of this configuration is being able to build managed sources on Windows for non-Windows targets.

Correct. The "source-index-stage" job which runs in our official builds is what powers https://source.dot.net. It just needs to build the managed libraries under src/libraries. We need to skip the native packages under https://github.com/dotnet/runtime/tree/main/src/libraries/System.IO.Ports/pkg as they currently require the native assets of the TargetOS to be built. Cross-building linux assets on a Windows OS isn't a supported scenario today AFAIK.

I submitted a workaround to fix the failing official builds via #85239. My previous attempt was flawed as I missed that we can't build the libs.native subset targeting Linux, on Windows.

@ViktorHofer can you verify on Windows if the build works as before by adding /p:PackageOS=win? Should I make a PR for the _packageOS assignment?

let me check. FWIW here's the build command:

sourceIndexBuildCommand: build.cmd -subset libs.native+libs.sfx+libs.oob -binarylog -os linux -ci

@tmds
Copy link
Member Author

tmds commented Apr 24, 2023

Correct. The "source-index-stage" job which runs in our official builds is what powers https://source.dot.net/.

Interesting to know, I was wondering why the job was building sources like that.

I submitted a workaround to fix the failing official builds via #85239.

Thanks for looking into the fix.

@ViktorHofer
Copy link
Member

ViktorHofer commented Apr 24, 2023

Thanks for looking into the fix.

Unfortunately, appending /p:PackageOS=win didn't help:

C:\Program Files\dotnet\sdk\8.0.100-preview.3.23178.7\Sdks\NuGet.Build.Tasks.Pack\build\NuGet.Build.Tasks.Pack.targets(
221,5): error NU5026: The file 'C:\git\runtime2\artifacts\bin\native\net8.0-linux-Debug-x64\libSystem.IO.Ports.Native.s
o.dbg' to be packed was not found on disk. [C:\git\runtime2\src\libraries\System.IO.Ports\pkg\runtime.linux-x64.runtime
.native.System.IO.Ports.proj]

@tmds
Copy link
Member Author

tmds commented Apr 24, 2023

Unfortunately, appending /p:PackageOS=win didn't help:

Yes. I didn't understand well enough what these projects were and should have taken a closer look. Somehow I had assumed they'd be generated using PackageRID in their name and changing it would cause it to no longer match with OutputRID. But they are real projects with those literal rids in their name, and to avoid the match, we should have set OutputRID instead.

Thanks for getting the issue fixed.

@ViktorHofer
Copy link
Member

Thanks for getting the issue fixed.

Sure, but my fix was just meant as a workaround. Would it make sense to additionally add the following back?

<_portableOS Condition="'$(HostOS)' == 'win'">win</_portableOS>

@MichaelSimons
Copy link
Member

@tmds - This change broke the stage2 build of source-build. The PackageRID is set to portable e.g. linux-x64 whereas previously it was non-portable. Looking at the changes, this looks suspect - https://github.com/dotnet/runtime/pull/82832/files#diff-9da24614831c308827a1ae533ffea392c97638c261dd42bd0f5226baa136d16eR210


    <!-- source-build sets PackageOS to build with non-portable rid packages that were source-built previously. -->
    <PackageRID Condition="'$(PackageOS)' != ''">$(PackageOS)-$(TargetArchitecture)</PackageRID>
    <PackageRID>$(_packageOS)-$(TargetArchitecture)</PackageRID>

The first PackageRID assignment will always be overridden by the second.

@tmds
Copy link
Member Author

tmds commented Apr 25, 2023

The first PackageRID assignment will always be overridden by the second.

Yes, this is an error. This should have been similar to:

<!-- source-build sets ToolsOS to build with non-portable rid packages that were source-built previously. -->
<ToolsRID Condition="'$(ToolsOS)' != ''">$(ToolsOS)-$(_hostArch)</ToolsRID>
<ToolsRID Condition="'$(ToolsRID)' == ''">$(_portableHostOS)-$(_hostArch)</ToolsRID>

@tmds
Copy link
Member Author

tmds commented Apr 25, 2023

@MichaelSimons #85350 adds the missing Condition.

@ghost ghost locked as resolved and limited conversation to collaborators May 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants