Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indicator coverage: visualisation in browser #103

Open
pindec opened this issue Jul 23, 2021 · 10 comments
Open

Indicator coverage: visualisation in browser #103

pindec opened this issue Jul 23, 2021 · 10 comments

Comments

@pindec
Copy link

pindec commented Jul 23, 2021

We could:

  • add a new "indicators" tab to the DRT to avoid overcrowding one screen
  • group the indicators by topic, creating a panel for each.
    • given the large number of indicators, consider only expanding the top panel on load
  • for each indicator, show measurability and indicator coverage (see Indicator coverage visualisation: calculation #100)
  • for each indicator with coverage < 100%, provide expandable scrolling sections for missing fields and related coverage (there may be many missing fields).
    e.g.
    image

(image updated to show indicators ordered by indicator coverage desc)

@jpmckinney
Copy link
Member

@sabahfromlondon If you have time, happy to get your input on how to integrate this information. I can give more background on a call.

@jpmckinney
Copy link
Member

As in #102 (comment), have all the percentages mean the same thing. That is: a high % means high coverage. Here, the missing fields table has high % meaning low coverage.

@sabahfromlondon
Copy link

@jpmckinney yes let's have a call to discuss. Happy to provide input :)

@pindec
Copy link
Author

pindec commented Jul 26, 2021

I agree that consistency is good, but I'm not clear what the best avenue for consistency is here - whether it should be that higher means the user can/should take action, or whether higher is 'better' on the calculation.

Higher means action point:

  • As outlined in Indicator coverage visualisation: calculation #100 , "high" is "good" for indicator coverage (i.e. a larger proportion of projects contain all fields), and the user is more likely to take action (i.e. calculate the indicator) for the higher scores.
  • We could order the indicator list in each section by coverage desc so that the user sees the indicators they can calculate at the top (or grouping by question then by coverage so that the same questions are kept together). (Note - I updated the wireframe above to show order by coverage desc).
  • For missing fields coverage, "high" means a larger number of projects are missing the required fields, so high in this context is 'bad' but also means an action point, since the user can use these 'high' scores to prioritise which fields to calculate and the list can be ordered by coverage desc also to support that.

Higher is better on calculation:

  • For "missing fields", we do not want to list the "best" fields that have 100% coverage in projects, since that means they are not by definition missing fields, and there is no user action, but omitting them might be confusing to the user.
  • To ensure the actionable items are at the top to help users, we'd have to order the fields list by coverage asc by default, which also might be confusing in the context of an indicator list ordered by coverage desc.

Would it help to rename the "missing fields coverage" so that we are not using the term 'coverage' in both?

@jpmckinney
Copy link
Member

jpmckinney commented Jul 26, 2021

I am not familiar with a mental model in which a big bar means "take action". Typically, in visualizations, the results that require action are indicated using color (red), icons (alert) or text (the action to take).

Having the bar match field coverage fits a mental model that works in the real world. If a basket is 5% full of apples, the apples will cover a short amount of the basket. 5% is small, amount of apples is small: we have consistency. If a person's job is to fill the baskets to 100%, "small = action" comes very naturally. We could say the basket is 95% empty of apples, but we typically do not think in terms of the share of missingness, whether in real-world or data scenarios. In many dashboards (including Pelican), higher = better, consistently, and users are comfortable and quick to identify "oh, that bar is short, I should work to fix that." Bar length doesn't have a strong association with a need for action, one way or another; it requires interpretation.

f9f4d76d851c6578c1c2e25e7ef822c6

For indicator sorting, I think it is better to maintain the same order across presentations. People who use the tool more than once will start to remember which indicator appears where. Using the same order avoids user errors like assuming that the first indicator is the same as on their last visit.

@jpmckinney
Copy link
Member

jpmckinney commented Jul 26, 2021

For this table, there are a few simplifications to make: image

When there are many columns and rows, the user has a harder time determining what to focus on. We can reduce the number of columns by:

  1. Eliminating "Measurable". This column has the same meaning as "Indicator coverage 0%". If we want to emphasize (or de-emphasize) that it is 0%, we can do that with color, font weight, etc. Or, we can put "N/A" if none of the required fields are present in the dataset (since 0% can also be achieved by the fields never appearing in combination).
  2. Merge "Question" and "Indicator" into "Indicator". The current Indicator column is just the indicator's methodology; it is explanatory text that a user can learn to ignore once they are familiar with the indicators. This can be expressed as a second line below the question, perhaps with a "Methodology: " label, and perhaps with a lighter font color or font weight, to de-emphasize it, since it is explanatory text, not feedback text.

The three-level hierarchy (topic, indicator, fields) with two expandable elements (topic, fields) is also an issue. Having expandable topics is good, since we have 7-8 topics and 37 indicators; a user might only be interested in specific topics, for example. Having a second expandable element, however, introduces UX issues (@sabahfromlondon can share from our user testing of the DRT error report.)

Turning to the missing fields table: I am not clear on why the rows have multiple fields. method_1 for cost overruns depends on 5 fields: id, budget/amount/amount, budget/amount/currency, completion/finalValue/amount, completion/finalValue/currency. If my data has 95% coverage for amounts, but 2% coverage for currencies (maybe I only set the currency when it is a foreign currency), then reporting an amount-currency pair together will show 2%. As a user, I might have a hard time reconciling this with my prior knowledge, since I might know that my amount coverage is high. Furthermore, knowing that the pair is 2% doesn't tell me whether I need to improve amount, currency, or both. So, I think each field needs its own row.

Of course, another dataset could have 50% amount coverage and 50% currency coverage, but 0% combined coverage (if they are never used together), in which case the user could be confused why the overall coverage isn't 50%. That said, I would still report each field individually because, (1) I believe this scenario is extremely rare - most of the time, the overall coverage will just be a bit lower than the minimum coverage of the required fields - and (2) we end up swapping this (rarely encountered) confusing scenario for the (frequently encountered) confusing scenario in the previous paragraph.

@jpmckinney
Copy link
Member

Also, like with field coverage, we should use a bar to show coverage, which makes it much faster to identify low coverage than numbers (especially if all numbers have the same number of digits like 10-99).

@sabahfromlondon
Copy link

sabahfromlondon commented Jul 26, 2021

I'm new to OC4IDS, so will limit myself to UX comments.

With the testing that we did for the regular Data Review Tool, we found that it was easy for issues to get missed. I think that having the second expandable element will exacerbate this issue. The one dropdown e.g. Efficiency should expand all the issues within that category. I would also add the number of issue types found for that category e.g. Efficiency - Three issue types found.

Second removing columns that we don't need! These really need to be tested with users. We found that some the regular DRT columns were not clear to users and others did not add value.

High being good for some contexts and bad for others won't work. It places too much cognitive load on the user. Number of issues keeps things simple, and, at a high level, for users to know if their dataset is perfect, good, bad or uncheckable, we used a meter visual.

Happy to share screens. Hope this helps.

@pindec
Copy link
Author

pindec commented Jul 28, 2021

Thanks both. Given the sprint focus on field level coverage, I'll defer to that for now and we can pick up this discussion in future, if that suits you.

@sabahfromlondon
Copy link

Suits me. Feel free to reach out directly any time :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants