Skip to content

Use a list to log resolved ACRs in events#11307

Merged
lmgeorge merged 4 commits intomainfrom
lmgeorge/reporting-semantic-acr-values
Oct 9, 2024
Merged

Use a list to log resolved ACRs in events#11307
lmgeorge merged 4 commits intomainfrom
lmgeorge/reporting-semantic-acr-values

Conversation

@lmgeorge
Copy link
Contributor

@lmgeorge lmgeorge commented Oct 2, 2024

🛠 Summary of changes

Why

  • Deter using the resolved ACRs directly in reporting. The resolved list of ACRs (or Vectors of Trust) is volatile and subject to relatively frequent change as service features are added, removed, or modified over time. Instead, most analytics users should use the resolved attribute flags (e.g., facial_match or enhanced_ipp) for event filtering. Going forward, the purpose of the sp_request.component_values field should be solely to support investigating unexpected behavior.

  • Eliminate unexpected "reformatting" or "truncation" of resolved ACR names. While truncating these strings makes composing queries simpler, it introduces complexity and confusion when triaging and debugging requests. Originally, the purpose was to be able to do simple filters like properties.sp_request.component_values.Pb, but with the deprecation of the VoT interfaces (and thereby the 2-character string names), the filter becomes sp_request.component_values.`urn:acr.login.gov:verified` , which is much less legible than: 'urn:acr.login.gov:verified' IN sp_request.component_values

  • Why not store component values as a string? Basically, VoT/vtr style components and ACR style components "stringify" differently, which would make query composition even more complex, instead of less.

How

  • Adds an array of strings containing the ACR name will be stored in sp_request.component_names.
  • No other changes have been made to structure of sp_request.
  • New specs have been added for all IAL ACRs.

changelog: Internal, Analytics, Streamlining inclusion of resolved ACR names

resolves: https://gitlab.login.gov/lg-people/Melba/backlog-fy24/-/issues/112

🎫 Ticket

Link to the relevant ticket: https://gitlab.login.gov/lg-people/Melba/backlog-fy24/-/issues/112

📜 Testing Plan

  1. Create an account via the sample app
    • Note: Make sure to test with both an IdV enabled and disabled SP
  2. Check the events.log (locally or in Cloudwatch) and verify that components_values is displaying as an array
  3. Repeat steps 1-2 for auth-only user, verified user, a verified user that has been proofed with selfie (facial match).
    • Note: Pay special attention when going through the "facial match preferred" flow.

Additional notes

This will require updating Cloudwatch queries. Impacted ones are listed in the linked ticket.

@lmgeorge lmgeorge changed the title Use a list to log resolved ACRs in events [Do not review] Use a list to log resolved ACRs in events Oct 2, 2024
@lmgeorge lmgeorge force-pushed the lmgeorge/reporting-semantic-acr-values branch from 9eef77e to a977a80 Compare October 2, 2024 21:37
@lmgeorge lmgeorge requested a review from Sgtpluck October 2, 2024 22:15
@lmgeorge lmgeorge force-pushed the lmgeorge/reporting-semantic-acr-values branch from adb197e to d7efa10 Compare October 4, 2024 02:35
@lmgeorge lmgeorge marked this pull request as ready for review October 4, 2024 02:40
@lmgeorge lmgeorge changed the title [Do not review] Use a list to log resolved ACRs in events Use a list to log resolved ACRs in events Oct 4, 2024
@lmgeorge lmgeorge self-assigned this Oct 4, 2024
@lmgeorge lmgeorge requested a review from zachmargolis October 4, 2024 02:42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we switched the name on this? because changing the type of this means querying this field before/after the change would get tricky

Suggested change
attributes[:component_values] = resolved_result.component_names
attributes[:component_names] = resolved_result.component_names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, are you proposing eliminating component_values and adding component_names?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah to keep the types clearer when querying across old + new events

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i think that does make sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally sure that is needed if only because all impacted production queries (5 total) wouldn't be referencing this field any more (they should all be switched to using sp_request.facial_match or sp_request.identity_proofing), but maybe I'm not fully grokking the concern. I'll ping you offline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zachmargolis wanted to make sure this felt resolved to you. it still feels like it makes sense to add a new data element rather than alter the old one, in case anyone is using the old value in ad-hoc queries. but i'll defer to whatever decision ya'll have come to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I was writing this to optimize for querying, I would:

  • keep the hash-style format (it's easier to query in Cloudwatch)

If we are set changing to using a list, then I would:

  • change the key name to make it easier to query sub-fields across future + past events

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify a particular point: We don't want to optimize for querying. The logged ACR values mean different things at different points in the identity verification and authentication request flows. The more useful information is captured in the rest of the sp_request attributes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to step back a bit, i think what we were trying to do with this change was create a consistent way for this information to be logged/queried, since the original implementation doesn't really translate to the new acr values (ie, some requests would have ial/2 => true, and others would have url:acr.login.gov:verified => true).

while the ticket i wrote originally said that we should update the hash syntax to array, zach's pointed out there is value in keeping the type of data consistent over time. so i am now thinking we should keep that value the way it was, and add a new key name for the full list of component names. i'll update the ticket to indicate as such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been updated to maintain the old sp_request.component_values behavior and adds the sp_request.component_names field.

lmgeorge and others added 4 commits October 7, 2024 19:12
Why
===

- Deter using the resolved ACRs directly in reporting. By nature, the
  resolved list of ACRs (or Vectors of Trust) is volatile and subject
  to change as service features are added over time. Instead, most users
  should use the resolved attribute flags (e.g., `facial_match` or
  `enhanced_ipp`) for event filtering. In effect, the purpose of the
  sp_request.component_values field is to support investigating
  unexpected behavior.

- Eliminate unexpected "reformatting" or "truncation" of resolved
  ACR names. While truncating these strings makes composing queries
  simpler, it introduces complexity and confusion when triaging and
  debugging requests.
  Originally, the purpose was to be able to do simple filters like
  `properties.sp_request.component_values.Pb`, but with the
  deprecation of the VoT interfaces, the filter becomes
  'sp_request.component_values.`urn:acr.login.gov:verified`', which is
  much less legible than:
  `'urn:acr.login.gov:verified' IN sp_request.component_values`
- Why not store component values as a string? Basically, VoT/vtr
  style components and ACR style components "stringify" differently,
  which would make query composition even more complex, instead of less.

How
===

- Instead of using a hash to map the resolved ACRs to a boolean, an
  array of strings containing the ACR name will be stored in
  `sp_request.component_values`.
- No other changes have been made to structure of `sp_request`.

changelog: Internal, Analytics, Streamlining inclusion of resolved ACR names

resolves: https://gitlab.login.gov/lg-people/Melba/backlog-fy24/-/issues/112
@lmgeorge lmgeorge force-pushed the lmgeorge/reporting-semantic-acr-values branch from 1a6f7a1 to 0080ca4 Compare October 7, 2024 23:12
@lmgeorge lmgeorge requested a review from zachmargolis October 7, 2024 23:17
@lmgeorge lmgeorge merged commit 0e9bba2 into main Oct 9, 2024
@lmgeorge lmgeorge deleted the lmgeorge/reporting-semantic-acr-values branch October 9, 2024 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants