Agenda for Feb 16, 2023 #287

foolip · 2023-02-16T07:35:02Z

Here is the proposed agenda for our meeting on Feb 16, 2023:

Updates on action items from last week:
- Blocking metrics updates on test regressions #276. Two PRs sent for review:
  - Write non-OK harness statuses to a log file results-analysis#150
  - Create a workflow for maintaining a list of labeled tests in-repo #288
- Investigation Effort goals and tracking #277
Scoring: variants and multi-globals #256
Finish the Interop Team Charter #275
Reconsider the browser specific failures graph #270
- This depends on the charter discussion
Review Interop 2022 dashboard changes #283
- Which changes do we want to retroactive make to 2021/2022?
- When do we "freeze" past dashboards? See Inconsistent end-point for Interop pages #286
Handling test renames or splits (test suite evolution over time) #284

Apologies for posting this agenda on the same day. If we move forward with #102 we'll need to get better about this, announcing at least 48 hours in advance.

Previous meeting: #278

nairnandu · 2023-02-17T15:21:36Z

Attendees: @bkardell, @boazsender, @foolip, @jensimmons, @meyerweb, @nairnandu, @nt1m, @Rnyman, @tantek, @zcorpan

Updates on action items from last week:
- Blocking metrics updates on test regressions #276. Two PRs sent for review:
  - Write non-OK harness statuses to a log file results-analysis#150
  - Create a workflow for maintaining a list of labeled tests in-repo #288
- Investigation Effort goals and tracking #277
  - [zcorpan] Issue filed for accessibility testing area
  - [foolip] Was the issue mentioned within the scope of the investigation area?
  - [boazsender] Discussing ownership for accessibility investigation area with chrishtr@, zcorpan@ and gsnedders@
Scoring: variants and multi-globals #256
- [foolip] Agreement on the ideal behavior for this issue would be a good start
- [gsnedders] Scoring as 1/8th would be a reasonable solution
- [foolip] We need to find a volunteer to do this work
- [gsnedders] Do we want to retroactively apply this for 2023?
- [jensimmons] We should stage it first so that we can review the impact
- [foolip] Consensus on the approach - When we have variants, we can group them using information from the manifest, score them individually, and divide by the number of variants in the group. Similarly for multi-global tests.
Finish the Interop Team Charter #275
- [foolip] This should be finished before Interop 2024
- [jensimmons] Let's separate them into different issues, so that we can talk about each one independently. We can aim to make these decisions before June.
- AI: foolip will file independent issues
- [foolip] Browser specific failures - on the question of having this be part of Interop scope - does anyone feel strongly about this?
- [jensimmons] The browser specific graph is problematic in that there is a perception that it covers all tests. Should we have more than one graph? We want to form an opinion on what should be the goal with graphs. I am good with this being discussed in this group (vs WPT group)
- [zcorpan] Could this graph be per directory?
- [gsnedders] Many things were proposed in the initial RFC. We do need to decide if this is a topic for Interop team
- [boazsender] I do have some thoughts on this
- [foolip] What would be the desired outcome?
- [boazsender] Codify our perspective on what is the messaging we want to send out with WPT.
- [jensimmons] We could re-design the homepage to make sure it aligns with the charter
- [boazsender] There are stakeholders that are not part of the Interop team meeting. We should survey or get inputs from all contributors to WPT
- [gsnedders] Should the default be status quo? Not deleting the graph because we can’t arrive at a consensus
- [jensimmons] Maybe it should not be a binary decision. We definitely need to make improvements and that should be in the charter.
- [bkardell] We should not leave it at status quo
- [jensimmons] Agree. We do have a strong appetite to put effort into the re-design
- [boazsender] In the very early days of WPT (at TPAC), there was concern about making wpt.fyi like an ACID test. Bringing that back to this group so that we think about that when we.
- [jensimmons] Data itself is misleading. Safari dropped from 5k to 3k tests because we shipped one feature (offscreen canvas). We were failing some of the tests because the test had dependencies on the offscreen canvas feature. That graph is more reflective of the way 2 engines (Chromium and Webkit) differ in the way testing (and test workflow integration) is done.
- [boazsender] solution that group came up with was to just show the Interop score
- [foolip] we can do 2 things for the cases like offscreen canvas?
  - Weighting by feature
  - Filter by spec maturity
- [jensimmons] we were failing ~1700 tests for features that Safari had shipped, but the feature tests depended on offscreen canvas, which we had not shipped at the time.
- [nt1m] A metric or a graph that covers the whole WPT project in general would be difficult to keep track of, in general. Filtering would be a good first step.
- Consensus: Yes to working on presenting the overall tests as part of defining the charter.
Reconsider the browser specific failures graph #270
- This depends on the charter discussion
Review Interop 2022 dashboard changes #283
- Questions:
  - Which changes do we want to retroactively make to 2021/2022?
  - When do we "freeze" past dashboards? See Inconsistent end-point for Interop pages #286
- [gsnedders] the scores above are based on the overall score, but the graph does not end at EOY
- [jensimmons] The data on the dashboard should be frozen at the EOY (Dec. 31st). Should be static.
- [tantek] I would have preferred that older dashboards retain the display that was decided for them with all their context at the time.
- [zcorpan] Agree to switching to a static page
- [tantek] If we are freezing the scores, we should rename the header to read something other than a “dashboard”. Like “Interop 2022 conclusion” or “Interop 2022 results”.
- [foolip] should we change the 2022 page to the old design?
- [jensimmons] agree with the idea of making any yearly changes to the dashboard be scoped to that particular year
- [tantek] It sets the policy moving forward. Freeze numbers and presentation of the dashboard at the end of the year.
- [jensimmons] If someone re-writes a test and that impacts a previous score - the old scores or static dashboard should not change
- [nt1m] changing the graph per focus area will be hard, if we make the change
- [foolip] we can freeze the data behind the graph
Handling test renames or splits (test suite evolution over time) #284
- Not discussed, but is largely addressed by freezing past year’s dashboards.

foolip · 2023-02-17T15:45:54Z

@jensimmons I was curious about the situation with offscreen canvas so I took a closer look today.

First, the CSV file confirms the change (~4894 → ~3288, a reduction of ~1600) was between STP 160 and 161.

I took a look at the list of fixed test, and the vast majority of tests with fixes are in /html/canvas/offscreen, over 1600. Exactly how much that changed the BSF score I can't tell, but it does seem like it would be the biggest explanation for the drop.

Are there a bunch of other tests that depend on offscreen canvas but don't need to? A combination of the above diff and searching for "OffscreenCanvas" in the repo turns up "only" transfer-errors.window.html, 8 in imagebitmap-renderingcontext/, and 26 inwebcodecs/.

If there's something more than meets the eye here I'd love to take a closer look, but from this first look it does seem like the change is mostly due to offscreen canvas tests.

Of course this still leaves the problem that there are lots of tests for offscreen canvas and it gets an unreasonable weight, but that's something I think we can address by filtering by directory, feature, or spec, including standardization venue and status.

foolip added the agenda Agenda item for the next meeting label Feb 16, 2023

foolip mentioned this issue Feb 28, 2023

Charter: team chair #292

Closed

nairnandu closed this as completed Mar 1, 2023

foolip mentioned this issue Mar 16, 2023

Agenda for Mar 16, 2023 #301

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agenda for Feb 16, 2023 #287

Agenda for Feb 16, 2023 #287

foolip commented Feb 16, 2023 •

edited

Loading

nairnandu commented Feb 17, 2023

foolip commented Feb 17, 2023

Agenda for Feb 16, 2023 #287

Agenda for Feb 16, 2023 #287

Comments

foolip commented Feb 16, 2023 • edited Loading

nairnandu commented Feb 17, 2023

foolip commented Feb 17, 2023

foolip commented Feb 16, 2023 •

edited

Loading