Use a list to log resolved ACRs in events#11307
Conversation
9eef77e to
a977a80
Compare
adb197e to
d7efa10
Compare
app/services/analytics.rb
Outdated
There was a problem hiding this comment.
what if we switched the name on this? because changing the type of this means querying this field before/after the change would get tricky
| attributes[:component_values] = resolved_result.component_names | |
| attributes[:component_names] = resolved_result.component_names |
There was a problem hiding this comment.
Hmm, are you proposing eliminating component_values and adding component_names?
There was a problem hiding this comment.
yeah to keep the types clearer when querying across old + new events
There was a problem hiding this comment.
ah, i think that does make sense
There was a problem hiding this comment.
I'm not totally sure that is needed if only because all impacted production queries (5 total) wouldn't be referencing this field any more (they should all be switched to using sp_request.facial_match or sp_request.identity_proofing), but maybe I'm not fully grokking the concern. I'll ping you offline.
There was a problem hiding this comment.
@zachmargolis wanted to make sure this felt resolved to you. it still feels like it makes sense to add a new data element rather than alter the old one, in case anyone is using the old value in ad-hoc queries. but i'll defer to whatever decision ya'll have come to.
There was a problem hiding this comment.
If I was writing this to optimize for querying, I would:
- keep the hash-style format (it's easier to query in Cloudwatch)
If we are set changing to using a list, then I would:
- change the key name to make it easier to query sub-fields across future + past events
There was a problem hiding this comment.
Just to clarify a particular point: We don't want to optimize for querying. The logged ACR values mean different things at different points in the identity verification and authentication request flows. The more useful information is captured in the rest of the sp_request attributes.
There was a problem hiding this comment.
to step back a bit, i think what we were trying to do with this change was create a consistent way for this information to be logged/queried, since the original implementation doesn't really translate to the new acr values (ie, some requests would have ial/2 => true, and others would have url:acr.login.gov:verified => true).
while the ticket i wrote originally said that we should update the hash syntax to array, zach's pointed out there is value in keeping the type of data consistent over time. so i am now thinking we should keep that value the way it was, and add a new key name for the full list of component names. i'll update the ticket to indicate as such.
There was a problem hiding this comment.
This has been updated to maintain the old sp_request.component_values behavior and adds the sp_request.component_names field.
Why === - Deter using the resolved ACRs directly in reporting. By nature, the resolved list of ACRs (or Vectors of Trust) is volatile and subject to change as service features are added over time. Instead, most users should use the resolved attribute flags (e.g., `facial_match` or `enhanced_ipp`) for event filtering. In effect, the purpose of the sp_request.component_values field is to support investigating unexpected behavior. - Eliminate unexpected "reformatting" or "truncation" of resolved ACR names. While truncating these strings makes composing queries simpler, it introduces complexity and confusion when triaging and debugging requests. Originally, the purpose was to be able to do simple filters like `properties.sp_request.component_values.Pb`, but with the deprecation of the VoT interfaces, the filter becomes 'sp_request.component_values.`urn:acr.login.gov:verified`', which is much less legible than: `'urn:acr.login.gov:verified' IN sp_request.component_values` - Why not store component values as a string? Basically, VoT/vtr style components and ACR style components "stringify" differently, which would make query composition even more complex, instead of less. How === - Instead of using a hash to map the resolved ACRs to a boolean, an array of strings containing the ACR name will be stored in `sp_request.component_values`. - No other changes have been made to structure of `sp_request`. changelog: Internal, Analytics, Streamlining inclusion of resolved ACR names resolves: https://gitlab.login.gov/lg-people/Melba/backlog-fy24/-/issues/112
…onent_names, fix up tests
1a6f7a1 to
0080ca4
Compare
🛠 Summary of changes
Why
Deter using the resolved ACRs directly in reporting. The resolved list of ACRs (or Vectors of Trust) is volatile and subject to relatively frequent change as service features are added, removed, or modified over time. Instead, most analytics users should use the resolved attribute flags (e.g.,
facial_matchorenhanced_ipp) for event filtering. Going forward, the purpose of thesp_request.component_valuesfield should be solely to support investigating unexpected behavior.Eliminate unexpected "reformatting" or "truncation" of resolved ACR names. While truncating these strings makes composing queries simpler, it introduces complexity and confusion when triaging and debugging requests. Originally, the purpose was to be able to do simple filters like
properties.sp_request.component_values.Pb, but with the deprecation of the VoT interfaces (and thereby the 2-character string names), the filter becomessp_request.component_values.`urn:acr.login.gov:verified`, which is much less legible than:'urn:acr.login.gov:verified' IN sp_request.component_valuesWhy not store component values as a string? Basically, VoT/vtr style components and ACR style components "stringify" differently, which would make query composition even more complex, instead of less.
How
sp_request.component_names.sp_request.changelog: Internal, Analytics, Streamlining inclusion of resolved ACR names
resolves: https://gitlab.login.gov/lg-people/Melba/backlog-fy24/-/issues/112
🎫 Ticket
Link to the relevant ticket: https://gitlab.login.gov/lg-people/Melba/backlog-fy24/-/issues/112
📜 Testing Plan
components_valuesis displaying as an arrayAdditional notes
This will require updating Cloudwatch queries. Impacted ones are listed in the linked ticket.