-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Define platform testing policy #120337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Define platform testing policy #120337
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new documentation file that defines a platform testing policy for the .NET repository. The policy aims to optimize testing coverage across different platforms while managing costs by strategically selecting which platform versions to test for different types of code changes.
Key changes:
- Establishes a testing policy where
main
branch PRs test on latest supported platforms andservicing
branch PRs test on oldest supported platforms - Documents the assumption that intermediate platform versions have sufficient coverage through testing the extremes
- Defines the scope of .NET lifecycle maintenance covering three versions: current development, previous release, and the release before that
Co-authored-by: Copilot <[email protected]>
1. Latest supported | ||
2. Oldest supported | ||
|
||
We assume that all supported platform versions in between have sufficient coverage based on the latest and the oldest. We currently have no defined strategy for pre-release versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is oversimplification. We have 3 categories of tests:
- src\tests... - core runtime tests, JIT tests for the most part. The architecture matrix matters for these. OS flavor and OS version matrix does not matter much.
- src\libraries...*Tests - libraries (API) tests. The OS and OS version matrix matters for these. Globalization, crypto, I/O, networking, ... are known to have many differences between OS flavors and versions, Architecture matrix does not matter a lot.
- Other - IL linker tests, host tests, ... . Platform neutral code for the most part. OS version or architecture specific do not matter a lot.
We have 3 different strategies among these 3 categories today:
- Core runtime tests: Matrix is focused on architecture coverage.
- Libraries tests: Matrix is focused on OS variety coverage.
- Other: . Matrix does not matter a whole lot. Also, these tests are very cheap so testing more than strictly necessary is not big deal.
The OS version mix strategy that you are proposing is fine for core runtime tests. (It is fine since OS versions do not matter for core runtime tests.)
I am not convinced that it is a good tradeoff for libraries tests. I expect that we would see quite a few OS flavor/version specific breaks to sneak through over time. It will create work for engineers on the libraries teams. They will need to remember to trigger optional legs for changes in sensitive areas and they will need to deal with breaks that sneak through. If we go with this plan, I would like to see explicit ack from @artl93 that the extra manual work is worth the saved machine costs. (IIRC, libraries tests are well structured and running them does not cost much. My mental picture is that Wasm/Browse testing costs about as much as all libraries testing on the many different OSes that we test libraries on.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that libraries tests care more about OS breadth. However, my proposal would still include running an arbitrarily large set of OSes, but limiting to the latest version in PR. My thinking there is that the likelihood of breaking a version that's not the latest, or a servicing release of an OS causing a break not seen in the latest, is fairly low. Do you disagree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I would be fine with an addendum that adds, say, a rolling build of the runtime tests against oldest supported versions. That way people ideally would not need to queue manual runs to catch these things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking there is that the likelihood of breaking a version that's not the latest
Taking accidental dependency on a new API that's unavailable in oldest supported version is plausible. I have signoff on #120358 where we did exactly that earlier today. It manifested as a build break since the dependency was from C code. If the dependency was via P/Invoke, it would be a runtime failure that would not be caught if we were testing on newest version only. I do not have a good idea about how many breaks we would see to sneak through. My guess is we would see one break per month to sneak through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the set of 3 Linux x64 flavors we run for libraries in CI in main today:
runtime/eng/pipelines/libraries/helix-queues-setup.yml
Lines 75 to 77 in 0082733
- Ubuntu.2204.Amd64.Open | |
- (AzureLinux.3.0.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:azurelinux-3.0-helix-amd64 | |
- (Centos.10.Amd64.Open)[email protected]/dotnet-buildtools/prereqs:centos-stream-10-helix-amd64 |
This maximizes coverage (variety) while minimizing costs:
- Redhat-based distribution vs. Debian-based distribution vs. special Azure Linux 3
- Physical OS vs. containers
- OpenSSL crypto provider vs. SymCrypt provider
- Older (Ubuntu 22 is 2022) vs. newer
If we were to go with the plan to bump everything, AzureLinux3 and CentOS10 are latest available so no change there. Ubuntu 22 would need to be replaced by Ubuntu 25. We would give up testing on older distros by doing that. I expect that we would give up testing on physical OS too since we would not want to pay for creating Ubuntu 25 OS images with short shelf life (Ubuntu 25 is not LTS). At that point, we may give up Ubuntu completely since it is not differentiated enough from Azure Linux 3 and CentOS anymore, and we can reduce the set down to just AzureLinux3 and CentOS10. I am sharing this thought process to show that there is interplay between the different dimensions of the matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this example.
AzureLinux3 and CentOS10 are latest available so no change there.
Agreed, I think those should stay as-is.
Ubuntu 22 would need to be replaced by Ubuntu 25. We would give up testing on older distros by doing that
Yup. I think we have a choice here on whether we prefer to do the absolute latest Ubuntu, regardless of LTS status, or the latest LTS. I would be fine with either one. I don't particularly like the current choice of 22.04, as it is the oldest supported and doesn't give us coverage of 24.04, which is probably the most commonly used Ubuntu by now. I'd rather main catch problems on the leading edge vs the trailing edge.
I expect that we would give up testing on physical OS too since we would not want to pay for creating Ubuntu 25 OS images with short shelf life (Ubuntu 25 is not LTS)
I'm fine with giving up testing physical images entirely.
At that point, we may give up Ubuntu completely since it is not differentiated enough from Azure Linux 3 and CentOS anymore
I think we should still have a Debian distribution, since they carry their own patches to common base libraries.
I am sharing this thought process to show that there is interplay between the different dimensions of the matrix.
Agreed. Different distributions decide that their versions mean different things so it's hard to pick just one policy. Nevertheless, "pick the latest" seems like a decent rule of thumb for main
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking accidental dependency on a new API that's unavailable in oldest supported version is plausible. I have signoff on #120358 where we did exactly that earlier today. It manifested as a build break since the dependency was from C code. If the dependency was via P/Invoke, it would be a runtime failure that would not be caught if we were testing on newest version only. I do not have a good idea about how many breaks we would see to sneak through. My guess is we would see one break per month to sneak through.
This sounds like a +1 to "rolling build of oldest version" for main. This doesn't sound common enough that I feel the need to check it every PR, but common enough that I wouldn't want this to slip too far without us noticing. Weekly seems reasonable, although we could see what "daily" does to the budget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should still have a Debian distribution, since they carry their own patches to common base libraries.
Should it be rolling based on the same reasoning that pushes older versions to rolling?
Debian-specific patches are minor compared to several years' worth of changes in Linux ecosystem. Assuming we are testing on some latest era Linux distro, it is more likely for us to introduce an issue that is specific to older Linux than an issue specific to Debian. If we are moving the earlier as not worth having in CI to rolling, we should move the latter to rolling as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me. That is, at least for runtime. I'll defer to libraries on what they think is important for their test suite.
Expanded the definition of a platform to include architectures, OSes, OS flavors, and crypto stacks. Updated the testing policy to clarify versioning strategy for platform coverage.
We want to mix and match platform versions and .NET versions to produce good platform coverage without too much cost. This means we want to catch breaks on each platform as quickly as possible, and prioritize catching the type of platform breaks that are most likely to affect the specific version being tested. | ||
|
||
* `main` - PRs run on the *latest supported* platform. | ||
* `servicing` - PRs run on the *oldest supported* platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change what we test during servicing vs. main, it is likely that we would need to spend time around every release to stabilize the servicing tree on new set of OSes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I think that's desirable since we want to watch the oldest support versions somewhere, and servicing is where it matters the most. That's because servicing releases can have older platforms go out of support in the middle of the servicing lifetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some language below to describe what I think should happen in vary broad terms whenever we release a new version.
I expect this to be more detailed when we actually refactor the helix queue definitions to follow this policy.
* `main` - PRs run on the *latest supported* platform. | ||
* `servicing` - PRs run on the *oldest supported* platform. | ||
|
||
The above policy only applies to PRs. Scheduled or incidental runs can be queued against other platform definitions, if deemed necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have automatic triggers of the optional legs that cover more versions for areas that are known to have significant differences between OS versions, such as crypto, so that devs working in these areas do not have think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as an open question. I don't have an opinion either way.
Added an open question regarding area paths triggering additional version testing.
Added instructions for handling servicing releases in PR configuration.
Tagging subscribers to this area: @dotnet/area-meta |
Co-authored-by: Jan Kotas <[email protected]>
This could get more complicated, but I want to make it simple to start