[HACKATHON] Initial crack at an IdV matcher#11624
[HACKATHON] Initial crack at an IdV matcher#11624solipet merged 3 commits intologin-hackathon-2024-user-narrativefrom
Conversation
Matcher is a state machine that collects IDV "attempts" as they happen and tries to suss out interesting things about them. [skip changelog]
| attr_reader :current_idv_attempt | ||
| attr_reader :idv_attempts |
There was a problem hiding this comment.
When we encounter a new "welcome submitted" event, a new IdvAttempt is created and stored in current_idv_attempt. Subsequent events then update the state of that attempt. If we see another "welcome submitted" event, the current_idv_attempt is put in idv_attempts and a new one is started.
| end | ||
|
|
||
| when RATE_LIMIT_REACHED_EVENT | ||
| handle_rate_limit_reached(event:) |
There was a problem hiding this comment.
This does not use for_current_idv_attempt because not all rate limits are idv-related. We need to figure that out first.
|
|
||
| private | ||
|
|
||
| def add_significant_event( |
There was a problem hiding this comment.
I'm not super happy about introducing another overload of the term "event" here, but the idea is we capture only the most interesting things in a way we can display to the end user
bin/summarize-user-events
Outdated
| find_cloudwatch_events do |event| | ||
| Time.zone ||= 'America/New_York' | ||
|
|
There was a problem hiding this comment.
probably easier to set the timezone once, instead of for each event?
| find_cloudwatch_events do |event| | |
| Time.zone ||= 'America/New_York' | |
| Time.zone ||= 'America/New_York' | |
| find_cloudwatch_events do |event| |
There was a problem hiding this comment.
I ran into an issue with this--I think that when the Cloudwatch client uses multiple threads each thread needs its time zone initialized.
There was a problem hiding this comment.
ah, I'll update it to use the configured zone.
There was a problem hiding this comment.
hmmm maybe we need to update the CW client to call that block back on the main thread only?
* Initial cloudwatch query script to summarize events * Query cloudwatch and get user events * add timestamp remove limit * [Hackathon] Allow sourcing events from stdin (#11619) * Allow sourcing events from stdin It may be useful sometimes to take a local cache of cloudwatch events and pipe them into this command. [skip changelog] * Add 'limit: 10000' to CW query This is required for `complete` to work * [Hackathon] Add ExampleMatcher (#11622) * Add ExampleMatcher Add an example matcher that just counts events and outputs how many it saw. [skip changelog] * Remove excess whitespace * Add frozen_string_literal: true * use optparse to allow command options/defaults (#11627) * [HACKATHON] Initial crack at an IdV matcher (#11624) * Initial crack at an IdV matcher Matcher is a state machine that collects IDV "attempts" as they happen and tries to suss out interesting things about them. [skip changelog] * removed unused method --------- Co-authored-by: Douglas Price <douglas.price@gsa.gov> * [HACKATHON] Output formatting tweaks (#11635) * Normalize @timestamp to UTC for each event Pre-parse it in the script so that matchers don't have to worry about it * Slightly improve output - Include timestamps where possible [skip changelog] * [HACKATHON] Minor tweaks (#11637) * Don't crash if no events found * Tweak handling of --end-date - Use a dash rather than underscore - Make sure we respect it if it's passed in * Sort events on stdin before processing Events from Cloudwatch queries will be sorted, but stdin is not guaranteed. Processing unsorted events can lead to weird, weird, outcomes * report on TrueID success/failure (#11638) * Try to identify IDV abandonment (#11639) If the user: - Has not completed the initial workflow and - Does not have an idv-related event new that 1 hour Call their attempt abandoned * Login hackathon 2024 user narrative account deletion (#11629) * include timestamp * add account deletion narrative matcher * remove unneeded matcher requirement * add deletion matcher * lint * rename account deletion * read events from file without changing stdin * remove ipp from gpo code submission event * update example documentation in script * Update lib/event_summarizer/vendor_result_evaluators/aamva.rb Co-authored-by: Zach Margolis <zachmargolis@users.noreply.github.com> * Start writing a spec * Tidy up logic in IV result evaluator * Set event['name'] if not already set * Fix typo * Use Eastern time zone by default * Update pluralization code + add spec * Start on spec for summarize-user-events command * Protect rubocop's delicate sensibilities * Add more specs Add some specs around option parsing, time parsing, and actually running the program * Look at banner michael --------- Co-authored-by: Malick Diarra <malick.diarra@gsa.gov> Co-authored-by: Doug Price <douglas.price@gsa.gov> Co-authored-by: Eileen <eileenmcfarland@navapbc.com> Co-authored-by: eileen-nava <80347702+eileen-nava@users.noreply.github.com> Co-authored-by: Zach Margolis <zachmargolis@users.noreply.github.com>
This PR adds a big, giant "IDV" matcher.
The matcher runs through event logs and observes when new attempts at identity verification start. It then tries to describe how they went.
Things it does well:
Things it does not so well:
Ultimately, identity verification is a "big" thing, and having a facility for sub-matchers rather than one monolithic matcher would probably be good. I don't think this PR represents the final form, but it does represent a form, and starts to capture some of the built-in knowledge you need to interpret IDV-related event logs.