Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"generic" or "comparable" tests -- should we be able to compare test results across NVDA, JAWS and VoiceOver? #336

Open
spectranaut opened this issue Nov 19, 2020 · 2 comments
Labels
Agenda+Community Group To discuss in the next workstream summary meeting (usually the last teleconference of the month)

Comments

@spectranaut
Copy link
Contributor

Hi everyone!

Some of us at Bocoup are realizing that there is a change in the way tests are written that makes them not comparable across ATs. You can see this now in the index page for each design pattern, here are the checkbox tests-- there are tests that are written for NVDA and JAWS, and completely different tests written for VoiceOver.

Our tests are written around actions a user would take, for example, "Navigate to an unchecked checkbox". Now imagine a future where we have test results, the way we are writing tests now, you cannot easily get a sense of the result of a single test across all screen readers. This is because this single tests has become three tests:

  • "Navigate to an unchecked checkbox in reading mode"
  • "Navigate to an unchecked checkbox in interaction more"
  • "Navigate to an unchecked checkbox"
    • This last on is implicitly "navigate to an unchecked checkbox using a modeless AT"

These tests all have the same assertions. All ATs, whether or not they are modeless or in a particular mode, should meet the same set of assertions after being directed to do the same action. It's the same action and the same set of assertions that make them the same test, they should be comparable. If we want to summarize the support for role=checkbox across screen readers, I think there should be just one line item for this test, that is, "Navigate to an unchecked checkbox". The complexity of modes vs none modes should be hidden from a user who wants to look at and understand the results.

I think there are two ways forward.

1. We remove "mode" as a top level concept.

I don't know it "modes" are more common or less common among all AT this project aims to eventually support. But even if it is a majority, it seems like a strange concept to have at the top level, when the tests really are about "actions" on can take. If you can perform an "action" in either mode, then my advice would be to refactor the tests to make the list of commands include the mode.
For a concrete example, let's combine "Navigate to an unchecked checkbox in interactive mode" and "Navigate to an unchecked checkbox in reading mode". For the JAWS version of this test, that means we'd see the following list of commands for "Navigate to an unchecked checkbox":

  • With Virtual Cursor on, use the command: X / Shift+X
  • With Virtual Cursor on, use the command: F / Shift+F
  • With Virtual Cursor on, use the command: Tab / Shift+Tab
  • With Virtual Cursor on, use the command: Up Arrow / Down Arrow
  • With Virtual Cursor on, use the command: Left Arrow / Right Arrow (with Smart Navigation on)
  • With PC Cursor on, use the command: Tab / Shift+Tab

2. We have some way of indicating these three tests test the same thing

We could have some metadata in the test to know all of these tests test the same thing. We could use the existing metadata:

  1. The list of assertions. If the assertions are the same, the tests can be compared.
  2. The "task", which is JSON within the test, instead of the name of the file.

The reason I'm not recommending this way forward is for two reasons:

  1. Reason one is minor: we would have to inspect the file or the results to know if a test was comparable, and not just go by the name of the test alone.
  2. Reason two is more major: going with this direction might lead people to get confused about "concept" of a test. Specifically, what is it that the a single "test" should be testing -- how granular or how wide or how generic or how specific? I think it will be easier to maintain and grow the repository and train new testers if it is very clear what a single test file should encapsulate. For example, if we go with recommendation 1, the definitions are clearer: A test is an action a user takes with an AT, the commands are all the ways a user can perform that action given any specific AT, and the assertions are the bare minimum responses we expect from the AT. If we don't drop mode as a top level concept, a test gets harder to define: A test is an action and a mode IF there is a mode.
@spectranaut spectranaut added the Agenda+Community Group To discuss in the next workstream summary meeting (usually the last teleconference of the month) label Nov 19, 2020
@jscholes
Copy link
Contributor

@spectranaut Some initial thoughts:

Within a test, we explicitly specify a task plus a mode, as two separate columns. But then, we also specify a title which, in the case of JAWS and NVDA at least, repeats both pieces of information. E.g.:

  • Task: navigate forwards to an unchecked checkbox
  • Mode: reading
  • Title: Navigate forwards to an unchecked checkbox in reading mode

What if we were to introduce a third value for the mode column (N/A, modeless or similar), and then use the combination of the task and mode columns to generate a title for testers? That way, the system could internally keep track of which task was applied to the same set of tests, while also reducing duplication for test developers.

Are there cases where the task wording should/does differ from that in the title? This could be a short-term optimisation as we discuss other ATs and approaches.

These tests all have the same assertions. All ATs, whether or not they are modeless or in a particular mode, should meet the same set of assertions after being directed to do the same action.

I'm not sure we can make that assertion for all patterns. For example, #331 (comment) which implies that at some point, macOS may need a different role-based assertion to that used on Windows. I guarantee that other similar cases will come up, particularly as we add additional ATs with different input modalities.

If you can perform an "action" in either mode, then my advice would be to refactor the tests to make the list of commands include the mode.

There are definitely actions which it doesn't make sense to test in multiple modes. For instance, navigating to the first or last item of a combobox listbox pop-up using Home and End doesn't work unless the user is in interaction mode.

Zooming out a little, I know that further down the road, tests will need to specify something more complex than reading vs interaction mode. For example, consider navigating to a checkbox on iOS. There are many factors:

  • Different input modalities: touch, keyboard, speech control, switch control...
  • Within touch, rotor versus swiping versus exploration
  • Within keyboard, Full Keyboard Access on versus off.
  • ... etc.

I don't know how much of that this project will aim to test. But if now is the right time to make the decision to remove modes as a top-level concept altogether, new abstractions will be needed in the future following detailed discussion about different ATs and their paradigms. Right now, I would lobby for less duplication within tests.

@spectranaut
Copy link
Contributor Author

thanks @jscholes for bring so much nuance to the discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Agenda+Community Group To discuss in the next workstream summary meeting (usually the last teleconference of the month)
Projects
None yet
Development

No branches or pull requests

2 participants