Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agenda for Feb 16, 2023 #287

Closed
foolip opened this issue Feb 16, 2023 · 2 comments
Closed

Agenda for Feb 16, 2023 #287

foolip opened this issue Feb 16, 2023 · 2 comments
Labels
agenda Agenda item for the next meeting

Comments

@foolip
Copy link
Member

foolip commented Feb 16, 2023

Here is the proposed agenda for our meeting on Feb 16, 2023:

Apologies for posting this agenda on the same day. If we move forward with #102 we'll need to get better about this, announcing at least 48 hours in advance.

Previous meeting: #278

@foolip foolip added the agenda Agenda item for the next meeting label Feb 16, 2023
@nairnandu
Copy link
Contributor

Attendees: @bkardell, @boazsender, @foolip, @jensimmons, @meyerweb, @nairnandu, @nt1m, @Rnyman, @tantek, @zcorpan

  • Updates on action items from last week:
  • Scoring: variants and multi-globals #256
    • [foolip] Agreement on the ideal behavior for this issue would be a good start
    • [gsnedders] Scoring as 1/8th would be a reasonable solution
    • [foolip] We need to find a volunteer to do this work
    • [gsnedders] Do we want to retroactively apply this for 2023?
    • [jensimmons] We should stage it first so that we can review the impact
    • [foolip] Consensus on the approach - When we have variants, we can group them using information from the manifest, score them individually, and divide by the number of variants in the group. Similarly for multi-global tests.
  • Finish the Interop Team Charter #275
    • [foolip] This should be finished before Interop 2024
    • [jensimmons] Let's separate them into different issues, so that we can talk about each one independently. We can aim to make these decisions before June.
    • AI: foolip will file independent issues
    • [foolip] Browser specific failures - on the question of having this be part of Interop scope - does anyone feel strongly about this?
    • [jensimmons] The browser specific graph is problematic in that there is a perception that it covers all tests. Should we have more than one graph? We want to form an opinion on what should be the goal with graphs. I am good with this being discussed in this group (vs WPT group)
    • [zcorpan] Could this graph be per directory?
    • [gsnedders] Many things were proposed in the initial RFC. We do need to decide if this is a topic for Interop team
    • [boazsender] I do have some thoughts on this
    • [foolip] What would be the desired outcome?
    • [boazsender] Codify our perspective on what is the messaging we want to send out with WPT.
    • [jensimmons] We could re-design the homepage to make sure it aligns with the charter
    • [boazsender] There are stakeholders that are not part of the Interop team meeting. We should survey or get inputs from all contributors to WPT
    • [gsnedders] Should the default be status quo? Not deleting the graph because we can’t arrive at a consensus
    • [jensimmons] Maybe it should not be a binary decision. We definitely need to make improvements and that should be in the charter.
    • [bkardell] We should not leave it at status quo
    • [jensimmons] Agree. We do have a strong appetite to put effort into the re-design
    • [boazsender] In the very early days of WPT (at TPAC), there was concern about making wpt.fyi like an ACID test. Bringing that back to this group so that we think about that when we.
    • [jensimmons] Data itself is misleading. Safari dropped from 5k to 3k tests because we shipped one feature (offscreen canvas). We were failing some of the tests because the test had dependencies on the offscreen canvas feature. That graph is more reflective of the way 2 engines (Chromium and Webkit) differ in the way testing (and test workflow integration) is done.
    • [boazsender] solution that group came up with was to just show the Interop score
    • [foolip] we can do 2 things for the cases like offscreen canvas?
      • Weighting by feature
      • Filter by spec maturity
    • [jensimmons] we were failing ~1700 tests for features that Safari had shipped, but the feature tests depended on offscreen canvas, which we had not shipped at the time.
    • [nt1m] A metric or a graph that covers the whole WPT project in general would be difficult to keep track of, in general. Filtering would be a good first step.
    • Consensus: Yes to working on presenting the overall tests as part of defining the charter.
  • Reconsider the browser specific failures graph #270
    • This depends on the charter discussion
  • Review Interop 2022 dashboard changes #283
    • Questions:
    • [gsnedders] the scores above are based on the overall score, but the graph does not end at EOY
    • [jensimmons] The data on the dashboard should be frozen at the EOY (Dec. 31st). Should be static.
    • [tantek] I would have preferred that older dashboards retain the display that was decided for them with all their context at the time.
    • [zcorpan] Agree to switching to a static page
    • [tantek] If we are freezing the scores, we should rename the header to read something other than a “dashboard”. Like “Interop 2022 conclusion” or “Interop 2022 results”.
    • [foolip] should we change the 2022 page to the old design?
    • [jensimmons] agree with the idea of making any yearly changes to the dashboard be scoped to that particular year
    • [tantek] It sets the policy moving forward. Freeze numbers and presentation of the dashboard at the end of the year.
    • [jensimmons] If someone re-writes a test and that impacts a previous score - the old scores or static dashboard should not change
    • [nt1m] changing the graph per focus area will be hard, if we make the change
    • [foolip] we can freeze the data behind the graph
  • Handling test renames or splits (test suite evolution over time) #284
    • Not discussed, but is largely addressed by freezing past year’s dashboards.

@foolip
Copy link
Member Author

foolip commented Feb 17, 2023

@jensimmons I was curious about the situation with offscreen canvas so I took a closer look today.

First, the CSV file confirms the change (~4894 → ~3288, a reduction of ~1600) was between STP 160 and 161.

I took a look at the list of fixed test, and the vast majority of tests with fixes are in /html/canvas/offscreen, over 1600. Exactly how much that changed the BSF score I can't tell, but it does seem like it would be the biggest explanation for the drop.

Are there a bunch of other tests that depend on offscreen canvas but don't need to? A combination of the above diff and searching for "OffscreenCanvas" in the repo turns up "only" transfer-errors.window.html, 8 in imagebitmap-renderingcontext/, and 26 inwebcodecs/.

If there's something more than meets the eye here I'd love to take a closer look, but from this first look it does seem like the change is mostly due to offscreen canvas tests.

Of course this still leaves the problem that there are lots of tests for offscreen canvas and it gets an unreasonable weight, but that's something I think we can address by filtering by directory, feature, or spec, including standardization venue and status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agenda Agenda item for the next meeting
Projects
None yet
Development

No branches or pull requests

2 participants